I’m trying to investigate page table walk in Linux kernel. I used standard way to walk through page table to find PFN (just for example, not actual code):
pgd_t *pgd; pte_t *ptep; pte_t pte; pud_t *pud; pmd_t *pmd;
struct page *pagePtr = NULL;
struct mm_struct *mm = current->mm;
pgd = pgd_offset(mm, addr);
pud = getPud(pgd, addr);
pmd = pmd_offset(pud, addr);
ptep = pte_offset_map(pmd, addr);
size_t pfn = pte_pfn(pte);
The system is
CPU: Intel(R) Core(TM) i7-3770
CPU # 3.40GHz
OS: Linux Fedora release 22 (Twenty Two) Kernel: 4.4.4-200.fc22.x86_64
I’m trying to understand how pgd pointer dereferenced to pud pointer. I put simple code into getPud function:
noinline pud_t *getPud(pgd_t *pgdPtr, unsigned long addr).
{
return pud_offset(pgdPtr, addr);
}
And try to disassemble it by objdump
00000000000000b0 <getPud>:
b0: e8 00 00 00 00 callq b5 <getPud+0x5>
b5: 55 push %rbp
b6: 48 8b 3f mov (%rdi),%rdi
b9: 48 89 e5 mov %rsp,%rbp
bc: ff 14 25 00 00 00 00 callq *0x0
c3: 48 c1 ee 1b shr $0x1b,%rsi
c7: 48 ba 00 00 00 00 00 movabs $0xffff880000000000,%rdx
ce: 88 ff ff
d1: 81 e6 f8 0f 00 00 and $0xff8,%esi
d7: 48 01 d6 add %rdx,%rsi
da: 48 ba 00 f0 ff ff ff movabs $0x3ffffffff000,%rdx
e1: 3f 00 00
e4: 48 21 d0 and %rdx,%rax
e7: 48 01 f0 add %rsi,%rax
ea: 5d pop %rbp
eb: c3 retq
ec: 0f 1f 40 00 nopl 0x0(%rax)
My assembler knowledge is not enough to understand constructions like callq *0x0
Could someone please shed some light on what is going on in getPud?
Thank you
Sergey
Update 1
I used objdump to disassemble LKM (cpes.ko) module I created to walk through page table.
>objdump -dr ./cpes.ko
./cpes.ko: file format elf64-x86-64
Disassembly of section .text:
00000000000000b0 <getPud>:
b0: e8 00 00 00 00 callq b5 <getPud+0x5>
b1: R_X86_64_PC32 __fentry__-0x4
b5: 55 push %rbp
b6: 48 8b 3f mov (%rdi),%rdi
b9: 48 89 e5 mov %rsp,%rbp
bc: ff 14 25 00 00 00 00 callq *0x0
bf: R_X86_64_32S pv_mmu_ops+0xf8
c3: 48 c1 ee 1b shr $0x1b,%rsi
c7: 48 ba 00 00 00 00 00 movabs $0xffff880000000000,%rdx
ce: 88 ff ff
d1: 81 e6 f8 0f 00 00 and $0xff8,%esi
d7: 48 01 d6 add %rdx,%rsi
da: 48 ba 00 f0 ff ff ff movabs $0x3ffffffff000,%rdx
e1: 3f 00 00
e4: 48 21 d0 and %rdx,%rax
e7: 48 01 f0 add %rsi,%rax
ea: 5d pop %rbp
eb: c3 retq
ec: 0f 1f 40 00 nopl 0x0(%rax)
You're looking at disassembly of the .o, right? Not the final linked binary? The 0x0 address is just a placeholder that the linker will fill in. (This is a memory-indirect call through a static/global function pointer). pud_offset is getting inlined into your function.
Try objdump -dr or -dR to show the relocation entries in with the disassembly output.
Or better, look at gcc -S output to have symbolic names. (-fverbose-asm is sometimes helpful). Figure out the command line that make builds your file with, and modify it to use -S -o- instead of -c
Better look up source code. As you can see, there is some platform-specific address magic.
Related
I'm trying to understand elf relocation and there're couple of things that I don't really understand:
say I've got:
relamain.c
#include <stdio.h>
#include <stdlib.h>
#include "relafoo.c"
int main() {
int n;
scanf("%d",&n);
printf("\ngot %d, %d!=%d",n,n,factorial(n));
return 0;
}
and relafoo.c
int factorial(int n) {
if (n == 0 || n == 1) {
return 1;
}
return factorial(n-1)*n;
}
now in relamain.o readelf -r i see:
000000000027 000900000002 R_X86_64_PC32 0000000000000000 factorial - 4
000000000052 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
00000000005c 000c00000004 R_X86_64_PLT32 0000000000000000 __isoc99_scanf - 4
000000000066 000900000002 R_X86_64_PC32 0000000000000000 factorial - 4
how come i have two offsets for the same function(factorial)
i objdump -d relamain.o:
0000000000000031 <main>:
31: 55 push rbp
32: 48 89 e5 mov rbp,rsp
35: 48 83 ec 10 sub rsp,0x10
39: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
40: 00 00
42: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
46: 31 c0 xor eax,eax
48: 48 8d 45 f4 lea rax,[rbp-0xc]
4c: 48 89 c6 mov rsi,rax
4f: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 56 <main+0x25>
56: b8 00 00 00 00 mov eax,0x0
5b: e8 00 00 00 00 call 60 <main+0x2f>
60: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
63: 89 c7 mov edi,eax
65: e8 00 00 00 00 call 6a <main+0x39>
6a: 89 c1 mov ecx,eax
6c: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc]
6f: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
72: 89 c6 mov esi,eax
74: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 7b <main+0x4a>
7b: b8 00 00 00 00 mov eax,0x0
80: e8 00 00 00 00 call 85 <main+0x54>
85: b8 00 00 00 00 mov eax,0x0
8a: 48 8b 75 f8 mov rsi,QWORD PTR [rbp-0x8]
8e: 64 48 33 34 25 28 00 xor rsi,QWORD PTR fs:0x28
95: 00 00
97: 74 05 je 9e <main+0x6d>
99: e8 00 00 00 00 call 9e <main+0x6d>
9e: c9 leave
9f: c3 ret
Looking at the produced code i see that all the calls are not referring to 66 nor 27 which are the offsets for my factorial function, why is that? according to "learning linux binary analysis" by Ryan "elfmaster" O'Neill, I should expect that at least i should see call 66 or call 27, can anyone explain this?
if anyone can link me to a good book that explains everything in details with examples(beyond man ofc) dynamic linking and relocations it would be great
1- You don't have two offset for the same function. you have two offset to two locations when the relocations must be applied. In your example you have two call to the function factorial. One at offset 0x26 and the other at offset 0x66 and the offset to the relocation linked to these calls are at offset 0x27 and 0x67. The "00000000" at the offsets 0x27 and 0x66 will be replaced by a value calculated by the linker. you can see the dump of the executable to be sure.
2- When creating the object file. The assembler don't know factorial address , so it places "00000000" and place a relocation to tell the linker to replace these 0 by the value needed to get factorial since only the linker will know it exact location.
3- May be Linkers & Loaders by John R. Levine. however what i suggest you, is to start reading http://www.skyfree.org/linux/references/ELF_Format.pdf. Maybe it can be enough, depending on the level of understanding you seek.
I have a simple c file:
// filename: test.c
void fun() {}
Then I compile test.c to libtest.so using commands:
gcc -shared -fPIC -Wl,--gc-sections -ffunction-sections -fdata-sections -o libtest.so test.c
strip -s ./libtest.so
Then use readelf to print symbols and its size:
readelf -sW ./libtest.so
I got:
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000420 0 SECTION LOCAL DEFAULT 9
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
4: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize#GLIBC_2.2.5 (2)
5: 00000000002007c8 0 NOTYPE GLOBAL DEFAULT ABS _end
6: 00000000002007b8 0 NOTYPE GLOBAL DEFAULT ABS _edata
7: 00000000002007b8 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
8: 0000000000000420 0 FUNC GLOBAL DEFAULT 9 _init
9: 000000000000052a 6 FUNC GLOBAL DEFAULT 11 fun
10: 0000000000000568 0 FUNC GLOBAL DEFAULT 12 _fini
Then use objdump to disassemble .text section of libtest.so:
objdump -S -d -j .text ./libtest.so
I got:
./libtest.so: file format elf64-x86-64
Disassembly of section .text:
0000000000000460 <fun-0xca>:
460: 48 83 ec 08 sub $0x8,%rsp
464: 48 8b 05 15 03 20 00 mov 0x200315(%rip),%rax # 200780 <_fini+0x200218>
46b: 48 85 c0 test %rax,%rax
46e: 74 02 je 472 <__cxa_finalize#plt+0x2a>
470: ff d0 callq *%rax
472: 48 83 c4 08 add $0x8,%rsp
476: c3 retq
477: 90 nop
478: 90 nop
479: 90 nop
47a: 90 nop
47b: 90 nop
47c: 90 nop
47d: 90 nop
47e: 90 nop
47f: 90 nop
480: 55 push %rbp
481: 80 3d 30 03 20 00 00 cmpb $0x0,0x200330(%rip) # 2007b8 <__bss_start>
488: 48 89 e5 mov %rsp,%rbp
48b: 41 54 push %r12
48d: 53 push %rbx
48e: 75 62 jne 4f2 <__cxa_finalize#plt+0xaa>
490: 48 83 3d f8 02 20 00 cmpq $0x0,0x2002f8(%rip) # 200790 <_fini+0x200228>
497: 00
498: 74 0c je 4a6 <__cxa_finalize#plt+0x5e>
49a: 48 8d 3d 57 01 20 00 lea 0x200157(%rip),%rdi # 2005f8 <_fini+0x200090>
4a1: e8 a2 ff ff ff callq 448 <__cxa_finalize#plt>
4a6: 48 8d 1d 3b 01 20 00 lea 0x20013b(%rip),%rbx # 2005e8 <_fini+0x200080>
4ad: 4c 8d 25 2c 01 20 00 lea 0x20012c(%rip),%r12 # 2005e0 <_fini+0x200078>
4b4: 48 8b 05 05 03 20 00 mov 0x200305(%rip),%rax # 2007c0 <__bss_start+0x8>
4bb: 4c 29 e3 sub %r12,%rbx
4be: 48 c1 fb 03 sar $0x3,%rbx
4c2: 48 83 eb 01 sub $0x1,%rbx
4c6: 48 39 d8 cmp %rbx,%rax
4c9: 73 20 jae 4eb <__cxa_finalize#plt+0xa3>
4cb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
4d0: 48 83 c0 01 add $0x1,%rax
4d4: 48 89 05 e5 02 20 00 mov %rax,0x2002e5(%rip) # 2007c0 <__bss_start+0x8>
4db: 41 ff 14 c4 callq *(%r12,%rax,8)
4df: 48 8b 05 da 02 20 00 mov 0x2002da(%rip),%rax # 2007c0 <__bss_start+0x8>
4e6: 48 39 d8 cmp %rbx,%rax
4e9: 72 e5 jb 4d0 <__cxa_finalize#plt+0x88>
4eb: c6 05 c6 02 20 00 01 movb $0x1,0x2002c6(%rip) # 2007b8 <__bss_start>
4f2: 5b pop %rbx
4f3: 41 5c pop %r12
4f5: c9 leaveq
4f6: c3 retq
4f7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
4fe: 00 00
500: 48 83 3d e8 00 20 00 cmpq $0x0,0x2000e8(%rip) # 2005f0 <_fini+0x200088>
507: 00
508: 55 push %rbp
509: 48 89 e5 mov %rsp,%rbp
50c: 74 1a je 528 <__cxa_finalize#plt+0xe0>
50e: 48 8b 05 73 02 20 00 mov 0x200273(%rip),%rax # 200788 <_fini+0x200220>
515: 48 85 c0 test %rax,%rax
518: 74 0e je 528 <__cxa_finalize#plt+0xe0>
51a: 48 8d 3d cf 00 20 00 lea 0x2000cf(%rip),%rdi # 2005f0 <_fini+0x200088>
521: c9 leaveq
522: ff e0 jmpq *%rax
524: 0f 1f 40 00 nopl 0x0(%rax)
528: c9 leaveq
529: c3 retq
000000000000052a <fun>:
52a: 55 push %rbp
52b: 48 89 e5 mov %rsp,%rbp
52e: c9 leaveq
52f: c3 retq
530: 55 push %rbp
531: 48 89 e5 mov %rsp,%rbp
534: 53 push %rbx
535: 48 83 ec 08 sub $0x8,%rsp
539: 48 8b 05 90 00 20 00 mov 0x200090(%rip),%rax # 2005d0 <_fini+0x200068>
540: 48 83 f8 ff cmp $0xffffffffffffffff,%rax
544: 74 19 je 55f <fun+0x35>
546: 48 8d 1d 83 00 20 00 lea 0x200083(%rip),%rbx # 2005d0 <_fini+0x200068>
54d: 0f 1f 00 nopl (%rax)
550: 48 83 eb 08 sub $0x8,%rbx
554: ff d0 callq *%rax
556: 48 8b 03 mov (%rbx),%rax
559: 48 83 f8 ff cmp $0xffffffffffffffff,%rax
55d: 75 f1 jne 550 <fun+0x26>
55f: 48 83 c4 08 add $0x8,%rsp
563: 5b pop %rbx
564: c9 leaveq
565: c3 retq
We can tell that the size of symbol fun is 6 which is correspond to virtual address 0x52a ~ 0x52f.
I have two question:
what does symbol fun-0xca do?
what does assembly code from 0x530 to 0x565 in symbol fun do?
Omit the strip -s ./libtest.so.
In the GCC-created libtest.so, each separate function has a symbol in the symbol table. objdump -drwC -Mintel libtest.so will show names for each one, like _init, deregister_tm_clones, register_tm_clones, and __do_global_dtors_aux. These come from CRT startup code, I think; use gcc -v when you're linking to see any extra .o files it passes to ld.
Stripping symbols removes that information, leaving machine code in the text section without a symbol name. The only symbol left for objdump to reference is fun, so it labels the first block of code relative to that, as fun-0xca.
This question already has answers here:
Linux syscall, libc, VDSO and implementation dissection
(1 answer)
function calls from fork() to do_fork()
(1 answer)
Closed 3 years ago.
I am trying to understand how 32-bit socketcall work by reading the assembly code in socket API and a few others in Libc library.
000ed9f0 <socket>:
ed9f0: 89 da mov %ebx,%edx
ed9f2: b8 66 00 00 00 mov $0x66,%eax # socketcall syscall number
ed9f7: bb 01 00 00 00 mov $0x1,%ebx # SYS_SOCKET value
ed9fc: 8d 4c 24 04 lea 0x4(%esp),%ecx # pointer to the *arg structure
eda00: 65 ff 15 10 00 00 00 call *%gs:0x10 # invokes syscall? but this is not sysenter or int 0x80
eda07: 89 d3 mov %edx,%ebx
eda09: 83 f8 83 cmp $0xffffff83,%eax
eda0c: 73 01 jae eda0f <socket+0x1f>
eda0e: c3 ret
eda0f: e8 cb 8d 03 00 call 1267df <__frame_state_for+0x35f>
eda14: 81 c1 ec d5 0b 00 add $0xbd5ec,%ecx
eda1a: 8b 89 24 ff ff ff mov -0xdc(%ecx),%ecx
eda20: f7 d8 neg %eax
eda22: 65 03 0d 00 00 00 00 add %gs:0x0,%ecx
eda29: 89 01 mov %eax,(%ecx)
eda2b: 83 c8 ff or $0xffffffff,%eax
eda2e: c3 ret
eda2f: 90 nop
See my code comment above (#). It makes sense to me until this line:
eda00: 65 ff 15 10 00 00 00 call *%gs:0x10 # invokes syscall? but this is not Sysenter or int 0x80
I thought we invoke syscall using either int 0x80 or Sysenter. But how does this call with segment register invokes the socketcall syscall?
I've been reading and studying assembly code. Code is below
Disassembly of section .text:
08048510 <main>:
8048510: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048514: 83 e4 f0 and $0xfffffff0,%esp
8048517: ff 71 fc pushl -0x4(%ecx)
804851a: 55 push %ebp
804851b: 89 e5 mov %esp,%ebp
804851d: 51 push %ecx
804851e: 83 ec 08 sub $0x8,%esp
8048521: 68 e0 93 04 08 push $0x80493e0
8048526: 68 c0 93 04 08 push $0x80493c0
804852b: 68 c9 93 04 08 push $0x80493c9
8048530: e8 7a 07 00 00 call 8048caf <eos_printf>
8048535: c7 04 24 d6 93 04 08 movl $0x80493d6,(%esp)
804853c: e8 6e 07 00 00 call 8048caf <eos_printf>
8048541: a1 38 c0 04 08 mov 0x804c038,%eax
8048546: bc 00 00 00 00 mov $0x0,%esp
804854b: ff e0 jmp *%eax
804854d: 8b 4d fc mov -0x4(%ebp),%ecx
8048550: 31 c0 xor %eax,%eax
8048552: c7 05 34 c0 04 08 00 movl $0x0,0x804c034
8048559: 00 00 00
804855c: c9 leave
804855d: 8d 61 fc lea -0x4(%ecx),%esp
8048560: c3 ret
Disassembly of section .data:
0804c030 <_irq_mask>:
804c030: ff (bad)
804c031: ff (bad)
804c032: ff (bad)
804c033: ff 01 incl (%ecx)
0804c034 <_eflags>:
804c034: 01 00 add %eax,(%eax)
...
0804c038 <_vector>:
804c038: 1d 8d 04 08 1d sbb $0x1d08048d,%eax
804c03d: 8d 04 08 lea (%eax,%ecx,1),%eax
804c040: 1d 8d 04 08 37 sbb $0x3708048d,%eax
804c045: 8d 04 08 lea (%eax,%ecx,1),%eax
At 0x8048541, EAX register is set to 0x804c038
At 0x804854b, process jump to the address pointed by EAX register
At 0x804c048, the instruction is < sbb $0x1d08048d, %eax>
By the instruction manual, sbb is stand for dest = dest - (src+carry flag). So we can replace 0x804c048 instruction to %eax = $eax - ($0x1d08048d + carry flag).
Then.... at that time, what value is set to carry flag value?
I didn't find any carry flag setting instruction previous to the 0x804c048 line. Is the carry flag is initially set to 0?
And the second question is, at 0x804854b, process jump to *%eax value. After that, how the process return to main function? there is nothing return instruction in _vector section.
I'll be glad to your help. Thanks.
Oh........ #MarkPlotnick You are God to me...... I was totally trapped in the < sbb $0x1d08048d, %eax >.
In the assembly source code, _vector array and _os_reset_handler function is defined as below.
.data
.global _vector
_vector:
.long _os_reset_handler
.long _os_reset_handler
.long _os_reset_handler
.long _os_irq_handler
.text
.global _os_reset_handler
_os_reset_handler:
_CLI
lea _os_init_stack_end, %esp
call _os_initialization
jmp _os_reset_handler
-----------------------
_CLI is defined in another c header file as macro
#define _CLI \
movl $0, _eflags;
I was consistently wondering why _vector array is not contain _os_reset_handler address. I read the disassembled code again and found that the objdump misaligned the hexcode at _vector data. "0x1d (address at 0x804c03c)" didn't go to new line, so it interpreted to irrelevant assembly code. (I'm very unhappy. I didn't do any other work to catch this problem for 10 hours...)
Anyway. At the address 0x8048d1d, there is _os_reset_handler function.
08048d1d <_os_reset_handler>:
8048d1d: c7 05 34 c0 04 08 00 movl $0x0,0x804c034
8048d24: 00 00 00
8048d27: 8d 25 48 d0 04 08 lea 0x804d048,%esp
8048d2d: e8 07 01 00 00 call 8048e39 <_os_initialization>
8048d32: e9 e6 ff ff ff jmp 8048d1d <_os_reset_handler>
No more questions. Thanks.
I want to convert this assembly program to shellcode.
This program just creates a file , my purpose is how I should convert assembly to shellcode when I using extern command in it
My assmbly code is :
extern _fopen,_fclose
global main
section .text
main:
xor r10,r10
push r10
mov r13, 0x6277
push r13
mov rsi,rsp
push r10
mov r13, 0x726964656b616d
push r13
mov rdi,rsp
call _fopen
mov r14, rax
mov rdi, r14
call _fclose
mov rax, 0x2000001 ; exit
mov rdi, 0
syscall
I used this command to compile it :
nasm -f macho64 test2.asm
ld -o test -e main test2.o -lSystem
and I used objdum -d test to create shellcode
...........$ objdump -d test
test: file format mach-o-x86-64
Disassembly of section .text:
0000000000001f93 <main>:
1f93: 4d 31 d2 xor %r10,%r10
1f96: 41 52 push %r10
1f98: 41 bd 77 62 00 00 mov $0x6277,%r13d
1f9e: 41 55 push %r13
1fa0: 48 89 e6 mov %rsp,%rsi
1fa3: 41 52 push %r10
1fa5: 49 bd 6d 61 6b 65 64 movabs $0x726964656b616d,%r13
1fac: 69 72 00
1faf: 41 55 push %r13
1fb1: 48 89 e7 mov %rsp,%rdi
1fb4: e8 1d 00 00 00 callq 1fd6 <_fopen$stub>
1fb9: 49 89 c6 mov %rax,%r14
1fbc: 4c 89 f7 mov %r14,%rdi
1fbf: e8 0c 00 00 00 callq 1fd0 <_fclose$stub>
1fc4: b8 01 00 00 02 mov $0x2000001,%eax
1fc9: bf 00 00 00 00 mov $0x0,%edi
1fce: 0f 05 syscall
Disassembly of section __TEXT.__stubs:
0000000000001fd0 <_fclose$stub>:
1fd0: ff 25 3a 00 00 00 jmpq *0x3a(%rip) # 2010 <_fclose$stub>
0000000000001fd6 <_fopen$stub>:
1fd6: ff 25 3c 00 00 00 jmpq *0x3c(%rip) # 2018 <_fopen$stub>
Disassembly of section __TEXT.__stub_helper:
0000000000001fdc <__TEXT.__stub_helper>:
1fdc: 68 00 00 00 00 pushq $0x0
1fe1: e9 0a 00 00 00 jmpq 1ff0 <_fopen$stub+0x1a>
1fe6: 68 0e 00 00 00 pushq $0xe
1feb: e9 00 00 00 00 jmpq 1ff0 <_fopen$stub+0x1a>
1ff0: 4c 8d 1d 11 00 00 00 lea 0x11(%rip),%r11 # 2008 <>
1ff7: 41 53 push %r11
1ff9: ff 25 01 00 00 00 jmpq *0x1(%rip) # 2000 <>
1fff: 90 nop
In normal condition i used opcode in "main" section and conveted it to shellcode and used this code to run it
#include <sys/mman.h>
#include <inttypes.h>
#include <unistd.h>
char code[] = "\x4d\x31\xd2\x41\x52\x41...For Example ...";
int main()
{
int (*ret)() = (int (*)())code;
void *page = (void *)((uintptr_t)code & ~(getpagesize() - 1));
mprotect(page, sizeof code, PROT_EXEC);
ret();
return 0;
}
but in this case it dosen't work and I know I should used other sections opcodes mentioned below the main section , but I don't know the arrange of calling them.
Please guide me.
your assmbly code is written in x64 mode,are you sure that the loader-'main' program is also compile to x64?
This one I've tried with an Macho64-Binary
for i in $( otool -t test2.o | cut -d ' ' -f 2- | grep ' '); do echo -n '\\x'$i; done; echo