Passing an array (argv) to a syscall in assembly x64 - linux

I am learning to create shellcode and having a great time. I mostly understand what to do. I can create asm code that will actually generate the shell. However, I was going to verify my ability by trying another syscall, namely cat .
I am using the method of building the stack from the registers. However, I am running into an issue where I need to pass an array to the 'argv' parameter. This is simple enough when doing a shell, I can just pass the address of the address of the /bin/sh string on the stack. But with cat I need to pass both the name of the function /bin/cat and the argument for cat ie /etc/issue.
I know that the layout for a syscall is:
rax : syscall ID
rdi : arg0
rsi : arg1
rdx : arg2
r10 : arg3
r8 : arg4
r9 : arg5
What I can't decipher is how to pass {"cat","/etc/issue"} into a single register, namely rsi.
My assembly:
global _start
section .text
_start:
;third argument
xor rdx,rdx
;second array member
xor rbx,rbx
push rbx ;null terminator for upcoming string
;push string in 2 parts
mov rbx,6374652f ;python '/etc/issue'[::-1].encode().hex()
push rbx
xor rbx,rbx
mov rbx, 0x65757373692f
push rbx
;first array member
xor rcx,rcx ;null terminator for upcoming string
add rcx,0x746163 ;python 'cat'[::-1].encode().hex()
push rcx
;first argument
xor rdi,rdi
push rdi ;null terminator for upcoming string
add rdi,7461632f6e69622f ;python '/bin/cat'[::-1].encode().hex()
push rdi
mov rdi,rsp
;execve syscall
xor rax,rax
add rax,59
;exit call
xor rdi,rdi
xor rax,rax
add rax,60
It runs but (as expected) aborts when a NULL is passed as argv.
I even tried just writing a C app that creates an array and quits and debugged that but I still didn't really understand what it was doing to create the array.

You're making this way more complicated than you need to. Here's all you need to do:
jmp .afterdata
.pathname:
db '/bin/' ; note lack of null terminator
.argv0:
db 'cat'
.endargv0:
db 1 ; we'll have to change the last byte to a null manually
.argv1:
db '/etc/issue'
.endargv1:
db 1 ; we'll have to change the last byte to a null manually
.afterdata:
xor eax, eax ; the null terminator for argv and envp
push rax
mov rdx, rsp ; rdx = envp
dec byte [rel .endargv1] ; change our 1 byte to a null byte
lea rax, [rel .argv1]
push rax
dec byte [rel .endargv0] ; change our 1 byte to a null byte
lea rax, [rel .argv0]
push rax
mov rsi, rsp ; rsi = argv
lea rdi, [rel .pathname]
xor eax, eax
mov al, 59 ; SYS_execve
syscall
; if you wanted to do an exit in case the execve fails, you could here, but for shellcode I don't see the point
You don't need to do any hex-encoding or reversing of strings by hand. You can just stick the strings you need right at the end of your shellcode, and push their addresses onto the stack with rip-relative addressing. The only hoops we jump through are making sure the data is before the instructions that use it, so there's no null bytes there, and having to add in the null terminators on the string at runtime.
Also, you generally want shellcode to be short. Notice how I point into the cat that's part of /bin/cat instead of having it an extra time, and reuse the null at the end of argv for envp.
By the way, if you want to try this as a standalone program, you'll need to pass -Wl,-N and -static to GCC, since the bytes it's modifying will be in the .text section (which is normally read-only). This won't be a problem when you're actually using it as shellcode, since it'll still be writable by whatever means you got it into memory in the first place.

Related

Why won't this stack string print in x64 NASM on macOS?

I've been able to successfully print a string using the sys_write to stdout on macOS. However, I cannot get this stack string to print using execve syscall with echo:
global _main
default rel
section .text
_main:
mov rbp, rsp
sub rsp, 32
mov rax, 'this a t'
mov [rbp-16], rax
mov rax, 'est'
mov [rbp-8], rax
mov rax, '/bin/ech'
mov [rbp-32], rax
xor rax, rax
mov al, 'o'
mov [rbp-24], rax
push 0
mov rax, 0
mov [rbp], rax
exit_program:
;rdi filename
;rsi argv
;rdx envp
lea rdi, [rbp-32]
lea rsi, [rbp-32]
mov rdx, 0
mov rax, 0x200003b
syscall
Currently, my return is EFAULT status code from execve.
The memory layout as shown in the screenshot is the string "This is a test" followed by null bytes for termination.
UPDATE: Trace output: execve("/bin/echo", [0x6863652f6e69622f, 0x6f, 0x7420612073696874, 0x747365], NULL) = -1 EFAULT (Bad address)
execve takes 3 args: a char* and two char *[] arrays, each terminated by a NULL pointer.
Your first arg is fine. It points to a zero-terminated array of ASCII characters which are a valid path.
Your argv is a char[], not char *[], because you passed the same value as your first arg! So when the system call interprets the data as an array of pointers to copy into the new process's arg array, it finds an invalid pointer 0x6863652f6e69622f as the first one. (The bytes of that pointer are ASCII codes.)
The trace output makes that pretty clear.
Your 3rd is NULL, not a pointer to NULL. Linux supports this, treating a NULL as an empty array. I don't know if MacOS does or not; if you still get EFAULT after passing a valid argv[] set RDX to a pointer to a qword 0 somewhere on the stack.
Keeping your existing setup code, you could change the last part to
lea rdi, [rbp-32] ; pointer to "/bin/echo"
push 0 ; NULL terminator
mov rdx, rsp ; envp = empty array
push some_reg ; holding a pointer to "this is a test"
push rdi ; pointer to "/bin/echo" = argv[0]
mov rsi, rsp ; argv
syscall
Note that envp[] and argv[] are terminated by the same NULL pointer. If you wanted a non-empty envp you couldn't do that.
If this is supposed to be shellcode, you're going to need to replace the push 0 with pushing an xor-zeroed register, and it looks like you could simplify some of the other stuff. But get it working first.

push/pop segmentation fault in simple multiplication function

my teacher is doing a crash course in assembly with us, and I have no experience in it whatsoever. I am supposed to write a simple function that takes four variables and calculates (x+y)-(z+a) and then prints out the answer. I know it's a simple problem, but after hours of research I am getting no where, any push in the right direction would be very helpful! I do need to use the stack, as I have more things to add to the program once I get past this point, and will have a lot of variables to store. I am compiling using nasm and gcc, in linux. (x86 64)
(side question, my '3' isn't showing up in register r10, but I am in linux so this should be the correct register... any ideas?)
Here is my code so far:
global main
extern printf
segment .data
mulsub_str db "(%ld * %ld) - (%ld * %ld) = %ld",10,0
data dq 1, 2, 3, 4
segment .text
main:
call multiplyandsubtract
pop r9
mov rdi, mulsub_str
mov rsi, [data]
mov rdx, [data+8]
mov r10, [data+16]
mov r8, [data+24]
mov rax, 0
call printf
ret
multiplyandsubtract:
;;multiplies first function
mov rax, [data]
mov rdi, [data+8]
mul rdi
mov rbx, rdi
push rbx
;;multiplies second function
mov rax, [data+16]
mov rsi, [data+24]
mul rsi
mov rbx, rsi
push rbx
;;subtracts function 2 from function 1
pop rsi
pop rdi
sub rdi, rsi
push rdi
ret
push in the right direction
Nice pun!
Your problem is that you apparently don't seem to know that ret is using the stack for the return address. As such push rdi; ret will just go to the address in rdi and not return to your caller. Since that is unlikely to be a valid code address, you get a nice segfault.
To return values from functions just leave the result in a register, standard calling conventions normally use rax. Here is a possible version:
global main
extern printf
segment .data
mulsub_str db "(%ld * %ld) - (%ld * %ld) = %ld",10,0
data dq 1, 2, 3, 4
segment .text
main:
sub rsp, 8
call multiplyandsubtract
mov r9, rax
mov rdi, mulsub_str
mov rsi, [data]
mov rdx, [data+8]
mov r10, [data+16]
mov r8, [data+24]
mov rax, 0
call printf
add rsp, 8
ret
multiplyandsubtract:
;;multiplies first function
mov rax, [data]
mov rdi, [data+8]
mul rdi
mov rbx, rdi
push rbx
;;multiplies second function
mov rax, [data+16]
mov rsi, [data+24]
mul rsi
mov rbx, rsi
push rbx
;;subtracts function 2 from function 1
pop rsi
pop rdi
sub rdi, rsi
mov rax, rdi
ret
PS: notice I have also fixed the stack alignment as per the ABI. printf is known to be picky about that too.
To return more than 64b from subroutine (rax is not enough), you can optionally drop the whole standard ABI convention (or actually follow it, there's surely a well defined way how to return more than 64b from subroutines), and use other registers until you ran out of them.
And once you ran out of spare return registers (or when you desperately want to use stack memory), you can follow the way C++ compilers do:
SUB rsp,<return_data_size + alignment>
CALL subroutine
...
MOV al,[rsp + <offset>] ; to access some value from returned data
; <offset> = 0 to return_data_size-1, as defined by you when defining
; the memory layout for returned data structure
...
ADD rsp,<return_data_size + alignment> ; restore stack pointer
subroutine:
MOV al,<result_value_1>
MOV [rsp + 8 + <offset>],al ; store it into allocated stack space
; the +8 is there to jump beyond return address, which was pushed
; at stack by "CALL" instruction. If you will push more registers/data
; at the stack inside the subroutine, you will have either to recalculate
; all offsets in following code, or use 32b C-like function prologue:
PUSH rbp
MOV rbp,rsp
MOV [rbp + 16 + <offset>],al ; now all offsets are constant relative to rbp
... other code ...
; epilogue code restoring stack
MOV rsp,rbp ; optional, when you did use RSP and didn't restore it yet
POP rbp
RET
So during executing the instructions of subroutine, the stack memory layout is like this:
rsp -> current_top_of_stack (some temporary push/pop as needed)
+x ...
rbp -> original rbp value (if prologue/epilogue code was used)
+8 return address to caller
+16 allocated space for returning values
+16+return_data_size
... padding to have rsp correctly aligned by ABI requirements ...
+16+return_data_size+alignment
... other caller stack data or it's own stack frame/return address ...
I'm not going to check how ABI defines it, because I'm too lazy, plus I hope this answer is understandable for you to explain the principle, so you will recognize which way the ABI works and adjust...
Then again, I would highly recommend to use rather many shorter simpler subroutines returning only single value (in rax/eax/ax/al), whenever possible, try to follow the SRP (Single Responsibility Principle). The above way will force you to define some return-data-structure, which may be too much hassle, if it's just some temporary thing and can be split into single-value subroutines instead (if performance is endangered, then probably inlining the whole subroutine will outperform even the logic of grouped returned values and single CALL).

I'm getting a segmentation fault in my assembly program [duplicate]

The tutorial I am following is for x86 and was written using 32-bit assembly, I'm trying to follow along while learning x64 assembly in the process. This has been going very well up until this lesson where I have the following simple program which simply tries to modify a single character in a string; it compiles fine but segfaults when ran.
section .text
global _start ; Declare global entry oint for ld
_start:
jmp short message ; Jump to where or message is at so we can do a call to push the address onto the stack
code:
xor rax, rax ; Clean up the registers
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
; Try to change the N to a space
pop rsi ; Get address from stack
mov al, 0x20 ; Load 0x20 into RAX
mov [rsi], al; Why segfault?
xor rax, rax; Clear again
; write(rdi, rsi, rdx) = write(file_descriptor, buffer, length)
mov al, 0x01 ; write the command for 64bit Syscall Write (0x01) into the lower 8 bits of RAX
mov rdi, rax ; First Paramter, RDI = 0x01 which is STDOUT, we move rax to ensure the upper 56 bits of RDI are zero
;pop rsi ; Second Parameter, RSI = Popped address of message from stack
mov dl, 25 ; Third Parameter, RDX = Length of message
syscall ; Call Write
; exit(rdi) = exit(return value)
xor rax, rax ; write returns # of bytes written in rax, need to clean it up again
add rax, 0x3C ; 64bit syscall exit is 0x3C
xor rdi, rdi ; Return value is in rdi (First parameter), zero it to return 0
syscall ; Call Exit
message:
call code ; Pushes the address of the string onto the stack
db 'AAAABBBNAAAAAAAABBBBBBBB',0x0A
This culprit is this line:
mov [rsi], al; Why segfault?
If I comment it out, then the program runs fine, outputting the message 'AAAABBBNAAAAAAAABBBBBBBB', why can't I modify the string?
The authors code is the following:
global _start
_start:
jmp short ender
starter:
pop ebx ;get the address of the string
xor eax, eax
mov al, 0x20
mov [ebx+7], al ;put a NULL where the N is in the string
mov al, 4 ;syscall write
mov bl, 1 ;stdout is 1
pop ecx ;get the address of the string from the stack
mov dl, 25 ;length of the string
int 0x80
xor eax, eax
mov al, 1 ;exit the shellcode
xor ebx,ebx
int 0x80
ender:
call starter
db 'AAAABBBNAAAAAAAABBBBBBBB'0x0A
And I've compiled that using:
nasm -f elf <infile> -o <outfile>
ld -m elf_i386 <infile> -o <outfile>
But even that causes a segfault, images on the page show it working properly and changing the N into a space, however I seem to be stuck in segfault land :( Google isn't really being helpful in this case, and so I turn to you stackoverflow, any pointers (no pun intended!) would be appreciated
I would assume it's because you're trying to access data that is in the .text section. Usually you're not allowed to write to code segment for security. Modifiable data should be in the .data section. (Or .bss if zero-initialized.)
For actual shellcode, where you don't want to use a separate section, see Segfault when writing to string allocated by db [assembly] for alternate workarounds.
Also I would never suggest using the side effects of call pushing the address after it to the stack to get a pointer to data following it, except for shellcode.
This is a common trick in shellcode (which must be position-independent); 32-bit mode needs a call to get EIP somehow. The call must have a backwards displacement to avoid 00 bytes in the machine code, so putting the call somewhere that creates a "return" address you specifically want saves an add or lea.
Even in 64-bit code where RIP-relative addressing is possible, jmp / call / pop is about as compact as jumping over the string for a RIP-relative LEA with a negative displacement.
Outside of the shellcode / constrained-machine-code use case, it's a terrible idea and you should just lea reg, [rel buf] like a normal person with the data in .data and the code in .text. (Or read-only data in .rodata.) This way you're not trying execute code next to data, or put data next to code.
(Code-injection vulnerabilities that allow shellcode already imply the existence of a page with write and exec permission, but normal processes from modern toolchains don't have any W+X pages unless you do something to make that happen. W^X is a good security feature for this reason, so normal toolchain security features / defaults must be defeated to test shellcode.)

NASM get console size

I am new to NASM (and assembler in general) and I am looking for way to get console size (number of console cols and rows) in NASM. Like AH=0Fh and INT 10h: http://en.wikipedia.org/wiki/INT_10H
Now, I understand, that in NASM (and linux in general) I can not do BIOS interruption, so there have to be other way.
The idea is to print some output to fill the screen and then wait for user to press ENTER until print more output.
If you are programming in Linux, then you must use the available system calls to achieve your aims. It's not that there are no interrupts. The system call itself is executed with an interrupt call. Outside of the kernel, however, you will be unable to access them and, since the kernel runs in protected mode, even if you could they likely wouldn't do what you would expect.
To your question, however. To obtain the console size you would need to make use of the ioctl system call. This is accessed with the value of 0x36 in EAX. I'd suggest that you have a read through the manual page for ioctl and you may also find this system call table very useful!
This is a problem I've got to deal with a time ago. The code for unistd.inc and termio.inc can be found here in the includes folder. The program can be found and the makefile you can find in de tree programs/basics/terminal-winsize.
The vaules for rows and columns you can get on any terminal (console). xpixels and ypixels you can get only from some terminals. (xterm yes, gnome-terminal depends). So if you don't get the x and y pixels (screensize) on from some terminals, the terminal is text-based I guess. Correct me if it has another reason for this behaviour.
You can convert this program easily to 32 bits since it make use of nasmx macros for the syscalls,. The only thing you have to do is to replace the 64 bit registers in 32 bit registers and put some parameters in the right register. Look for agguro on github to see all include files.
I hope this is helpfull to you
; Name: winsize
; Build: see makefile
; Run: ./winsize
; Description: Show the screen dimension of a terminal in rows/columns.
BITS 64
[list -]
%include "unistd.inc"
%include "termio.inc"
[list +]
section .bss
buffer: resb 5
.end:
.length: equ $-buffer
lf: resb 1
section .data
WINSIZE winsize
; keep the lengths the same or the 'array' construction will fail!
array: db "rows : "
db "columns : "
db "xpixels : "
db "ypixels : "
.length: equ $-array
.items: equ 4
.itemsize: equ array.length / array.items
section .text
global _start
_start:
mov BYTE[lf], 10 ; end of line in byte after
buffer
; fetch the winsize structure data
syscall ioctl, STDOUT, TIOCGWINSZ, winsize
; initialize pointers and used variables
mov rsi, array ; pointer to array of strings
mov rcx, array.items ; items in array
.nextVariable:
; print the text associated with the winsize variable
push rcx ; save remaining strings to process
push rdx ; save winsize pointer
syscall write, STDOUT, rsi, array.itemsize
pop rax ; restore winsize pointer
push rax ; save winsize pointer
; convert variable to decimal
mov ax, WORD[rax] ; get value form winsize structure
mov rdi, buffer.end-1
.repeat:
xor rbx, rbx ; convert value in decimal
mov bx, 10
xor rdx, rdx
div bx
xchg rax, rdx
or al, "0"
std
stosb
xchg rax, rdx
cmp al, 0
jnz .repeat
push rsi ; save pointer to text
; print the variable value
mov rsi, rdi
mov rdx, buffer.end ; length of variable
sub rdx, rsi
inc rsi
syscall write, STDOUT, rsi, rdx
pop rsi
pop rdx
; calculate pointer to next variable value in winsize
add rdx, 2
; calculate pointer to next string in strings
add rsi, array.itemsize
; if all strings processed
pop rcx ; remaining arrayitems
loop .nextVariable
; exit the program
syscall exit, 0

NASM x86_64 having trouble writing command line arguments, returning -14 in rax

I am using elf64 compilation and trying to take a parameter and write it out to the console.
I am calling the function as ./test wooop
After stepping through with gdb there seems to be no problem, everything is set up ok:
rax: 0x4
rbx: 0x1
rcx: pointing to string, x/6cb $rcx gives 'w' 'o' 'o' 'o' 'p' 0x0
rdx: 0x5 <---correctly determining length
after the int 80h rax contains -14 and nothing is printed to the console.
If I define a string in .data, it just works. gdb shows the value of $rcx in the same way.
Any ideas? here is my full source
%define LF 0Ah
%define stdout 1
%define sys_exit 1
%define sys_write 4
global _start
section .data
usagemsg: db "test {string}",LF,0
testmsg: db "wooop",0
section .text
_start:
pop rcx ;this is argc
cmp rcx, 2 ;one argument
jne usage
pop rcx
pop rcx ; argument now in rcx
test rcx,rcx
jz usage
;mov rcx, testmsg ;<-----uncomment this to print ok!
call print
jmp exit
usage:
mov rcx, usagemsg
call print
jmp exit
calclen:
push rdi
mov rdi, rcx
push rcx
xor rcx,rcx
not rcx
xor al,al
cld
repne scasb
not rcx
lea rdx, [rcx-1]
pop rcx
pop rdi
ret
print:
push rax
push rbx
push rdx
call calclen
mov rax, sys_write
mov rbx, stdout
int 80h
pop rdx
pop rbx
pop rax
ret
exit:
mov rax, sys_exit
mov rbx, 0
int 80h
Thanks
EDIT: After changing how I make my syscalls as below it works fine. Thanks all for your help!
sys_write is now 1
sys_exit is now 60
stdout now goes in rdi, not rbx
the string to write is now set in rsi, not rcx
int 80h is replaced by syscall
I'm still running 32-bit hardware, so this is a wild asmed guess! As you probably know, 64-bit system call numbers are completely different, and "syscall" is used instead of int 80h. However int 80h and 32-bit system call numbers can still be used, with 64-bit registers truncated to 32-bit. Your tests indicate that this works with addresses in .data, but with a "stack address", it returns -14 (-EFAULT - bad address). The only thing I can think of is that truncating rcx to ecx results in a "bad address" if it's on the stack. I don't know where the stack is in 64-bit code. Does this make sense?
I'd try it with "proper" 64-bit system call numbers and registers and "syscall", and see if that helps.
Best,
Frank
As you said, you're using ELF64 as the target of the compilation. This is, unfortunately, your first mistake. Using the "old" system call interface on Linux, e.g. int 80h is possible only when running 32-bit tasks. Obviously, you could simply assemble your source as ELF32, but then you're going to lose all the advantages if running tasks in 64-bit mode, namely the extra registers and 64-bit operations.
In order to make system calls in 64-bit tasks, the "new" system call interface must be used. The system call itself is done with the syscall instruction. The kernel destroys registers rcx and r11. The number of the system is specified in the register rax, while the arguments of the call are passed in rdi, rsi, rdx, r10, r8 and r9. Keep in mind that the numbers of the syscalls are different than the ones in 32-bit mode. You can find them in unistd_64.h, which is usually in /usr/include/asm or wherever your distribution stores it.

Resources