Assembly x86 brk() call use - linux

I am trying to dynamically allocate memory into the heap and then assign values in those memory addresses. I understand how to allocate the memory but how would I assign for example the value in a register to that first dynamic memory address?
This is what I have so far:
push rbp
mov rbp, rsp ;initialize an empy stack to create activation records for the rest of the subroutines
mov rax, 0x2d ;linux system call for brk()
mov rbx, 0x0 ;to get the adress of the first adress we are allocating we must have 0 in rbx
int 0x80 ;calls the linux operating system kernel for assistance
mov [brk_firstLocation], rax ;the first position in the heap will be returned in rax thus i save the first loaction in a varable called brk_firstLocation
mov rbx, rax ;the memory adress of the start of the heap is moved in rbx
add rbx, 0x14 ;we want 5 bytes worth of data alocated in the heap, so the start adress plus 20 bits
mov rax, 0x2d ;linux system call for brk()
int 0x80 ;calls the linux operating system kernel for assistance
What would I do, for example, to mov the value in rax into brk_firstLocation

others have pointed out a few things that are wrong with your code. I would like to add that you would not add 20 bits to the current breakpoint (or 20 bytes like add rbx, 20 actually does), you would simply add 5 bytes.
Also, your first syscall argument will not be in rbx, it will be in rdi. The 64-bit syscall ABI uses different system call numbers, different registers, and a different instruction (syscall instead of int 0x80) than the 32-bit ABI (which is still available in 64-bit processes). See also the x86 tag wiki for more ABI links.
Here's how your code should look:
push rbp
mov rbp, rsp
;; sys_brk(0)
mov rax, 12 ; 12 is SYS_brk (/usr/include/asm/unistd_64.h)
mov rdi, 0 ; rdi for first syscall arg in the 64-bit ABI, not rbx
syscall ; syscall, not int 0x80, for the 64-bit ABI
mov qword [brk_firstLocation], rax
;; sys_brk(old_break + 5)
lea rdi, [rax + 5] ; add 5 bytes to the break point
mov rax, 12
syscall ; set the new breakpoint
At this point you can use brk_firstLocation as a pointer to whatever 5 byte struct you want to store on the heap. Here's how you would put values in that memory space:
mov rdi, [brk_firstLocation] ; load the pointer from memory, if you didn't already have it in a register
mov byte [rdi], 'A' ; a char in the first byte
mov [rdi+1], ecx ; a 32-bit value in the last 4 bytes.

int 80h is only for 32 bit system calls. Use syscall for 64 bit instead.
Calling sys_brk twice is redundant - in assembly you always know where your program data ends. Simply put a label there and you will have the address.
Allocating this way memory less than one page is pointless, it will be allocated in blocks of 4KB anyway.
It is important to be understood - sys_brk is not heap management function. It is low level memory management.

There are a number of problems I see:
0x2d is the brk system call on x86 (32 bit); on x86_64 it's 0xc
brk sets the end of the data segment; it returns 0 on success and -1 on failure. It does not return "the first position in the heap". That comes from the symbol _end which the linker sets to the end of the uninitialized preallocated data.
So you want something like:
mov [brk_firstloaction], _end
mov rbx, [brk_firstlocation]
add rbx, 0x14 ; space for 5 dwords (20 bytes)
mov rax, 12
int 0x80

As you have done, call once to retrieve the current bottom of heap, then move the top of heap (which is the brk value). However your code is not correct in using the 64-bit register set r.x. If your Linux is 32-bit (as implied by the use of 45 for the syscall number), then you want the 32-bit register set:
mov eax, 45 ; brk
mov ebx, 0 ; arg 1: 0 = fail returning brk value in rax
int 0x80 ; syscall
mov dword ptr [brk_firstLocation], rax ; save result
mov eax, 45 ; brk
mov ebx, 4 ; arg 1: allocate 4 bytes for a 32-bit int
int 0x80 ; syscall
mov eax, dword ptr [brk_firstLocation] ; reload rax with allocated memory address.
mov dword ptr [eax], 42 ; use result: store 42 in newly allocated storage
Of course you can re-load the saved value for re-use as many times as needed.

Related

Compact shellcode to print a 0-terminated string pointed-to by a register, given puts or printf at known absolute addresses?

Background: I am a beginner trying to understand how to golf assembly, in particular to solve an online challenge.
EDIT: clarification: I want to print the value at the memory address of RDX. So “SUPER SECRET!”
Create some shellcode that can output the value of register RDX in <= 11 bytes. Null bytes are not allowed.
The program is compiled with the c standard library, so I have access to the puts / printf statement. It’s running on x86 amd64.
$rax : 0x0000000000010000 → 0x0000000ac343db31
$rdx : 0x0000555555559480 → "SUPER SECRET!"
gef➤ info address puts
Symbol "puts" is at 0x7ffff7e3c5a0 in a file compiled without debugging.
gef➤ info address printf
Symbol "printf" is at 0x7ffff7e19e10 in a file compiled without debugging.
Here is my attempt (intel syntax)
xor ebx, ebx ; zero the ebx register
inc ebx ; set the ebx register to 1 (STDOUT
xchg ecx, edx ; set the ECX register to RDX
mov edx, 0xff ; set the length to 255
mov eax, 0x4 ; set the syscall to print
int 0x80 ; interrupt
hexdump of my code
My attempt is 17 bytes and includes null bytes, which aren't allowed. What other ways can I lower the byte count? Is there a way to call puts / printf while still saving bytes?
FULL DETAILS:
I am not quite sure what is useful information and what isn't.
File details:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5810a6deb6546900ba259a5fef69e1415501b0e6, not stripped
Source code:
void main() {
char* flag = get_flag(); // I don't get access to the function details
char* shellcode = (char*) mmap((void*) 0x1337,12, 0, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mprotect(shellcode, 12, PROT_READ | PROT_WRITE | PROT_EXEC);
fgets(shellcode, 12, stdin);
((void (*)(char*))shellcode)(flag);
}
Disassembly of main:
gef➤ disass main
Dump of assembler code for function main:
0x00005555555551de <+0>: push rbp
0x00005555555551df <+1>: mov rbp,rsp
=> 0x00005555555551e2 <+4>: sub rsp,0x10
0x00005555555551e6 <+8>: mov eax,0x0
0x00005555555551eb <+13>: call 0x555555555185 <get_flag>
0x00005555555551f0 <+18>: mov QWORD PTR [rbp-0x8],rax
0x00005555555551f4 <+22>: mov r9d,0x0
0x00005555555551fa <+28>: mov r8d,0xffffffff
0x0000555555555200 <+34>: mov ecx,0x22
0x0000555555555205 <+39>: mov edx,0x0
0x000055555555520a <+44>: mov esi,0xc
0x000055555555520f <+49>: mov edi,0x1337
0x0000555555555214 <+54>: call 0x555555555030 <mmap#plt>
0x0000555555555219 <+59>: mov QWORD PTR [rbp-0x10],rax
0x000055555555521d <+63>: mov rax,QWORD PTR [rbp-0x10]
0x0000555555555221 <+67>: mov edx,0x7
0x0000555555555226 <+72>: mov esi,0xc
0x000055555555522b <+77>: mov rdi,rax
0x000055555555522e <+80>: call 0x555555555060 <mprotect#plt>
0x0000555555555233 <+85>: mov rdx,QWORD PTR [rip+0x2e26] # 0x555555558060 <stdin##GLIBC_2.2.5>
0x000055555555523a <+92>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555523e <+96>: mov esi,0xc
0x0000555555555243 <+101>: mov rdi,rax
0x0000555555555246 <+104>: call 0x555555555040 <fgets#plt>
0x000055555555524b <+109>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555524f <+113>: mov rdx,QWORD PTR [rbp-0x8]
0x0000555555555253 <+117>: mov rdi,rdx
0x0000555555555256 <+120>: call rax
0x0000555555555258 <+122>: nop
0x0000555555555259 <+123>: leave
0x000055555555525a <+124>: ret
Register state right before shellcode is executed:
$rax : 0x0000000000010000 → "EXPLOIT\n"
$rbx : 0x0000555555555260 → <__libc_csu_init+0> push r15
$rcx : 0x000055555555a4e8 → 0x0000000000000000
$rdx : 0x0000555555559480 → "SUPER SECRET!"
$rsp : 0x00007fffffffd940 → 0x0000000000010000 → "EXPLOIT\n"
$rbp : 0x00007fffffffd950 → 0x0000000000000000
$rsi : 0x4f4c5058
$rdi : 0x00007ffff7fa34d0 → 0x0000000000000000
$rip : 0x0000555555555253 → <main+117> mov rdi, rdx
$r8 : 0x0000000000010000 → "EXPLOIT\n"
$r9 : 0x7c
$r10 : 0x000055555555448f → "mprotect"
$r11 : 0x246
$r12 : 0x00005555555550a0 → <_start+0> xor ebp, ebp
$r13 : 0x00007fffffffda40 → 0x0000000000000001
$r14 : 0x0
$r15 : 0x0
(This register state is a snapshot at the assembly line below)
●→ 0x555555555253 <main+117> mov rdi, rdx
0x555555555256 <main+120> call rax
Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:
Create 0x7ffff7e3c5a0 (&puts) in a register with lea reg, [reg + disp32], using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).
This is a generalization of the code-golf trick of lea edi, [rax+1] trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly than push imm8 / pop reg.
The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.
Copy a 64-bit register in 2 bytes with push reg / pop reg, instead of 3-byte mov rdi, rdx (REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.
See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.
bits 64
lea rsi, [rdi - 0x166f30]
;; add rbp, imm32 ; alternative, but that would mess up a call-preserved register so we might crash on return.
push rdx
pop rdi ; copy RDX to first arg, x86-64 SysV calling convention
jmp rsi ; tailcall puts
This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32 is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)
The puts function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32 (5 bytes) with the absolute address, not deriving it from another register.
$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
1 bits 64
2 00000000 488DB7D090E9FF lea rsi, [rdi - 0x166f30]
3 ;; add rbp, imm32 ; we can avoid messing up any call-preserved registers
4 00000007 52 push rdx
5 00000008 5F pop rdi ; copy to first arg
6 00000009 FFE6 jmp rsi ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin # would work if the addresses in my build matched yours
My build of your incomplete .c uses different addresses on my machine, but it does reach this code (at address 0x10000, mmap_min_addr which mmap picks after the amusing choice of 0x1337 as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)
Since we only tailcall puts with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main.
Note that 0 bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.
The input is read using fgets (apparently to simulate a gets() overflow).
fgets actually can read a 0 aka '\0'; the only critical character is 0xa aka '\n' newline. See Is it possible to read null characters correctly using fgets or gets_s?
Often buffer overflows exploit a strcpy or something else that stops on a 0 byte, but fgets only stops on EOF or newline. (Or the buffer size, a feature gets is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.
BTW, your int 0x80 attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write, and the string you want to output is not in the low 32 bits of virtual address space.
Of course, with the 64-bit syscall ABI, you're fine if you can hardcode the length.
push rdx
pop rsi
shr eax, 16 ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
mov edi, eax ; STDOUT_FD = __NR_write
lea edx, [rax + 13 - 1] ; 3 bytes. RDX = 13 = string length
; or mov dl, 0xff ; 2 bytes leaving garbage in rest of RDX
syscall
But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).
mov dl, 0xff could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.
(Fun fact, Linux write does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len, you would get a -EFAULT return value for passing a bad pointer to write.)

Posix memalign fails on Broadwell but not on Skylake

The following code uses Posix Memalign to allocate four buffers per core, of 128 bytes each. It succeeds on a Skylake but it fails on a Broadwell (earlier generation) with the following message:
posix memalign malloc.c:2401: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed
According to the Linux man pages, memalign will fail in the following cases: (1) the alignment argument was not a power of two, or was not a multiple of sizeof(void *), or (2) there was insufficient memory to fulfill the allocation request. As it generates a segmentation fault, I can't get an error number from rax. The Broadwell has 8GB of memory, so insufficient memory is not the problem. The alignment is 64 so that's not the problem, and in any case it succeeds on my Skylake, so it's written correctly.
Here's the relevant code block:
mov rax,2 ; number of cores
mov rbx,4 ; number of buffers to create
mul rbx
mov r14,rax
xor r13,r13
Memalign_Create_Loop:
mov rax,r15 ; number of cores
mov rbx,r12 ; number of buffers needed
; N ptrs per core x 8 bytes per pointer
mul rbx ; times the number of cores
mov r12,rax ; number of buffers needed x number of cores
lea rdi,[memalign_pointer]
mov rsi,64 ; alignment
mov rdx,r12
shl rdx,3
mov rdx,128 ; buffer size
;xor rax,rax
sub rsp,40
call posix_memalign wrt ..plt
add rsp,40
lea r8,[internal_buffer_pointers]
lea rdi,[memalign_pointer]
mov rax,[rdi]
mov rbx,r13
shl rbx,3
mov [r8+rbx],rax
add r13,1
cmp r13,r14
jl Memalign_Create_Loop
It fails at "call posix_memalign wrt ..plt" and displays the error message shown above, along with a segmentation fault message.
It's a puzzle to me because it succeeds on the Skylake, and posix_memalign predates Broadwell.
I assemble and link with:
sudo nasm -f elf64 -g -F dwarf NWL.asm
sudo ld -shared NWL.o /opt/P01_SH/_Library/Create_Threads_in_C-NWL.o /opt/P01_SH/_Library/Timer_for_NASM.o /opt/P01_SH/_Library/POSIX_Shared_Memory.o /opt/P01_SH/_Library/PThread_Mutex.o /opt/P01_SH/_Library/Create_Multi_Files.o -ldl -lrt -lpthread -lm -o NWL.so
Thanks for any ideas on this.
I solved this problem working from the comment above by kisch ("The assertion message seems to indicate a case of heap corruption. Then the actual bug would be in earlier code, the corruption is only detected once your function tries another allocation").
My NASM code includes two files with memory allocation code. Before the fix it was:
; Create buffers to hold POSIX shared memory:
mov rbx,2 ; 2 output buffers per core
%include "/opt/P01_SH/_Include_Utilities/POSIX_Shared_Memory.asm"
mov r15,1 ;[Number_Of_Cores_Seq]
mov r12,4 ; number of internal_buffers needed
%include "/opt/P01_SH/_Include_Utilities/Posix_Memalign_Internal.asm"
To fix this, I reversed the order of the calls so that Posix_Memalign_Internal.asm is included before Posix_Shared_Memory.asm.
This is the code from Posix_Memalign_Internal.asm:
mov rax,r15
mov rbx,r12 ; passed in from calling function
mul rbx
mov r14,rax
xor r13,r13
Memalign_Create_Loop:
; Allocate the buffer for the memory pointers
; # of cores times N memory pointers needed (passed in from caller)
; int posix_memalign(void **memptr, size_t alignment, size_t size);
mov rax,r15 ; number of cores
mov rbx,r12 ; number of buffers needed
; N ptrs per core x 8 bytes per pointer
mul rbx ; times the number of cores
mov r12,rax ; number of buffers needed x number of cores
lea rdi,[memalign_pointer]
mov rsi,64 ; alignment
mov rdx,r12
shl rdx,3
mov rdx,128 ; buffer size
sub rsp,40
call posix_memalign wrt ..plt
add rsp,40
lea r8,[internal_buffer_pointers]
lea rdi,[memalign_pointer]
mov rax,[rdi]
mov rbx,r13
shl rbx,3
mov [r8+rbx],rax
add r13,1
cmp r13,r14
jl Memalign_Create_Loop
This is the code from Posix_Shared_Memory.asm:
mov rax,[Number_Of_Cores_Seq]
mul rbx
mov [shm_buffers_count],rax
shl rax,3
mov [shm_buffers_bytes],rax
mov r12,rax ; length of pointer buffer
; Buffer for shm pointers
mov rdi,r12
sub rsp,40
call [rel malloc wrt ..got]
mov qword [shm_buffers_ptr],rax
add rsp,40
; Buffer for shm fds
mov rdi,r12
sub rsp,40
call [rel malloc wrt ..got]
mov qword [shm_fds_ptr],rax
add rsp,40
; _________
; Allocate shared memory
xor r12,r12 ; buffer # (0, 8, 16, 24)
xor r13,r13 ; core# (0, 1, 2, 3, ...)
xor r14,r14 ; counter for shm_buffer
Get_ShmName:
lea rdi,[shm_base_name]
mov rsi,r12
mov rdx,r13
call [rel get_shm_name wrt ..got]
mov rdi,rax
push rdi ; shm_name ptr
call [rel unlink_shm wrt ..got]
pop rdi
; Create the shm buffer
mov rsi,[shm_size]
call [rel open_shm wrt ..got]
get_buffer:
mov rdi,rax
mov rax,[rdi]
get_buffer_data:
mov rbp,[shm_fds_ptr]
mov [rbp+r14],rax ; shm_fds
mov rbp,[shm_buffers_ptr]
mov rax,[rdi+8]
mov [rbp+r14],rax ; shm_ptrs
add r12,8
add r14,8
cmp r12,[shm_buffers_bytes]
jl Get_ShmName
next_core:
xor r12,r12
add r13,1
cmp r13,[Number_Of_Cores_Seq]
jl Get_ShmName
Posix_Memalign_Internal.asm uses posix_memalign to create four small buffers of 128 bytes each. Posix_Shared_Memory.asm creates two 4MB shared memory buffers.
My conclusion from this is that when posix_memalign and posix shared memory are used in the same program, posix_memalign should be called before posix shared memory buffers are created. In the final executable they appear on after the other with three instructions in between.
I don't see anything in the code blocks that explains why this is so, but it fixed this problem here.
As a final note, it failed today on a Skylake, so the CPU family is not relevant, as Nate Eldredge said in his comment above.
Thanks to all who responded.

Print newline with as little code as possible with NASM

I'm learning a bit of assembly for fun and I am probably too green to know the right terminology and find the answer myself.
I want to print a newline at the end of my program.
Below works fine.
section .data
newline db 10
section .text
_end:
mov rax, 1
mov rdi, 1
mov rsi, newline
mov rdx, 1
syscall
mov rax, 60
mov rdi, 0
syscall
But I'm hoping to achieve the same result without defining the newline in .data. Is it possible to call sys_write directly with the byte you want, or must it always be done with a reference to some predefined data (which I assume is what mov rsi, newline is doing)?
In short, why can't I replace mov rsi, newline by mov rsi, 10?
You always need the data in memory to copy it to a file-descriptor. There is no system-call equivalent of C stdio fputc that takes data by value instead of by pointer.
mov rsi, newline puts a pointer into a register (with a huge mov r64, imm64 instruction). sys_write doesn't special-case size=1 and treat its void *buf arg as a char value if it's not a valid pointer.
There aren't any other system calls that would do the trick. pwrite and writev are both more complicated (taking a file offset as well as a pointer, or taking an array of pointer+length to gather the data in kernel space).
There is a lot you can do to optimize this for code-size, though. See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code
First, putting the newline character in static storage means you need to generate a static address in a register. Your options here are:
5-bytes mov esi, imm32 (only in Linux non-PIE executables, so static addresses are link-time constants and are known to be in the low 2GiB of virtual address space and thus work as 32-bit zero-extended or sign-extended)
7-byte lea rsi, [rel newline] Works everywhere, the only good option if you can't use the 5-byte mov-immediate.
10-byte mov rsi, imm64. This works even in PIE executables (e.g. if you link with gcc -nostdlib without -static, on a distro where PIE is the default.) But only via a runtime relocation fixup, and the code-size is terrible. Compilers never use this because it's not faster than LEA.
But like I said, we can avoid static addressing entirely: Use push to put immediate data on the stack. This works even if we need zero-terminated strings, because push imm8 and push imm32 both sign-extend the immediate to 64-bit. Since ASCII uses the low half of the 0..255 range, this is equivalent to zero-extension.
Then we just need to copy RSP to RSI, because push leave RSP pointing to the data that was pushed. mov rsi, rsp would be 3 bytes because it needs a REX prefix. If you were targeting 32-bit code or the x32 ABI (32-bit pointers in long mode) you could use 2-byte mov esi, esp. But Linux puts the stack pointer at top of user virtual address space, so on x86-64 that's 0x007ff..., right at the top of the low canonical range. So truncating a pointer to stack memory to 32 bits isn't an option; we'd get -EFAULT.
But we can copy a 64-bit register with 1-byte push + 1-byte pop. (Assuming neither register needs a REX prefix to access.)
default rel ; We don't use any explicit addressing modes, but no reason to leave this out.
_start:
push 10 ; \n
push rsp
pop rsi ; 2 bytes total vs. 3 for mov rsi,rsp
push 1 ; _NR_write call number
pop rax ; 3 bytes, vs. 5 for mov edi, 1
mov edx, eax ; length = call number by coincidence
mov edi, eax ; fd = length = call number also coincidence
syscall ; write(1, "\n", 1)
mov al, 60 ; assuming write didn't return -errno, replace the low byte and keep the high zeros
;xor edi, edi ; leave rdi = 1 from write
syscall ; _exit(1)
.size: db $ - _start
xor-zeroing is the most well-known x86 peephole optimization: it saves 3 bytes of code size, and is actually more efficient than mov edi, 0. But you only asked for the smallest code to print a newline, without specifying that it had to exit with status = 0. So we can save 2 bytes by leaving that out.
Since we're just making an _exit system call, we don't need to clean up the stack from the 10 we pushed.
BTW, this will crash if the write returns an error. (e.g. redirected to /dev/full, or closed with ./newline >&-, or whatever other condition.) That would leave RAX=-something, so mov al, 60 would give us RAX=0xffff...3c. Then we'd get -ENOSYS from the invalid call number, and fall off the end of _start and decode whatever is next as instructions. (Probably zero bytes which decode with [rax] as an addressing mode. Then we'd fault with a SIGSEGV.)
objdump -d -Mintel disassembly of that code, after building with nasm -felf64 and linking with ld
0000000000401000 <_start>:
401000: 6a 0a push 0xa
401002: 54 push rsp
401003: 5e pop rsi
401004: 6a 01 push 0x1
401006: 58 pop rax
401007: 89 c2 mov edx,eax
401009: 89 c7 mov edi,eax
40100b: 0f 05 syscall
40100d: b0 3c mov al,0x3c
40100f: 0f 05 syscall
0000000000401011 <_start.size>:
401011: 11 .byte 0x11
So the total code-size is 0x11 = 17 bytes. vs. your version with 39 bytes of code + 1 byte of static data. Your first 3 mov instructions alone are 5, 5, and 10 bytes long. (Or 7 bytes long for mov rax,1 if you use YASM which doesn't optimize it to mov eax,1).
Running it:
$ strace ./newline
execve("./newline", ["./newline"], 0x7ffd4e98d3f0 /* 54 vars */) = 0
write(1, "\n", 1
) = 1
exit(1) = ?
+++ exited with 1 +++
If this was part of a larger program:
If you already have a pointer to some nearby static data in a register, you could do something like a 4-byte lea rsi, [rdx + newline-foo] (REX.W + opcode + modrm + disp8), assuming the newline-foo offset fits in a sign-extended disp8 and that RDX holds the address of foo.
Then you can have newline: db 10 in static storage after all. (Put it .rodata or .data, depending on which section you already had a pointer to).
It expects an address of the string in rsi register. Not a character or string.
mov rsi, newline loads the address of newline into rsi.

Writing to allocated memory with sbrk results in segfault [duplicate]

I am trying to dynamically allocate memory into the heap and then assign values in those memory addresses. I understand how to allocate the memory but how would I assign for example the value in a register to that first dynamic memory address?
This is what I have so far:
push rbp
mov rbp, rsp ;initialize an empy stack to create activation records for the rest of the subroutines
mov rax, 0x2d ;linux system call for brk()
mov rbx, 0x0 ;to get the adress of the first adress we are allocating we must have 0 in rbx
int 0x80 ;calls the linux operating system kernel for assistance
mov [brk_firstLocation], rax ;the first position in the heap will be returned in rax thus i save the first loaction in a varable called brk_firstLocation
mov rbx, rax ;the memory adress of the start of the heap is moved in rbx
add rbx, 0x14 ;we want 5 bytes worth of data alocated in the heap, so the start adress plus 20 bits
mov rax, 0x2d ;linux system call for brk()
int 0x80 ;calls the linux operating system kernel for assistance
What would I do, for example, to mov the value in rax into brk_firstLocation
others have pointed out a few things that are wrong with your code. I would like to add that you would not add 20 bits to the current breakpoint (or 20 bytes like add rbx, 20 actually does), you would simply add 5 bytes.
Also, your first syscall argument will not be in rbx, it will be in rdi. The 64-bit syscall ABI uses different system call numbers, different registers, and a different instruction (syscall instead of int 0x80) than the 32-bit ABI (which is still available in 64-bit processes). See also the x86 tag wiki for more ABI links.
Here's how your code should look:
push rbp
mov rbp, rsp
;; sys_brk(0)
mov rax, 12 ; 12 is SYS_brk (/usr/include/asm/unistd_64.h)
mov rdi, 0 ; rdi for first syscall arg in the 64-bit ABI, not rbx
syscall ; syscall, not int 0x80, for the 64-bit ABI
mov qword [brk_firstLocation], rax
;; sys_brk(old_break + 5)
lea rdi, [rax + 5] ; add 5 bytes to the break point
mov rax, 12
syscall ; set the new breakpoint
At this point you can use brk_firstLocation as a pointer to whatever 5 byte struct you want to store on the heap. Here's how you would put values in that memory space:
mov rdi, [brk_firstLocation] ; load the pointer from memory, if you didn't already have it in a register
mov byte [rdi], 'A' ; a char in the first byte
mov [rdi+1], ecx ; a 32-bit value in the last 4 bytes.
int 80h is only for 32 bit system calls. Use syscall for 64 bit instead.
Calling sys_brk twice is redundant - in assembly you always know where your program data ends. Simply put a label there and you will have the address.
Allocating this way memory less than one page is pointless, it will be allocated in blocks of 4KB anyway.
It is important to be understood - sys_brk is not heap management function. It is low level memory management.
There are a number of problems I see:
0x2d is the brk system call on x86 (32 bit); on x86_64 it's 0xc
brk sets the end of the data segment; it returns 0 on success and -1 on failure. It does not return "the first position in the heap". That comes from the symbol _end which the linker sets to the end of the uninitialized preallocated data.
So you want something like:
mov [brk_firstloaction], _end
mov rbx, [brk_firstlocation]
add rbx, 0x14 ; space for 5 dwords (20 bytes)
mov rax, 12
int 0x80
As you have done, call once to retrieve the current bottom of heap, then move the top of heap (which is the brk value). However your code is not correct in using the 64-bit register set r.x. If your Linux is 32-bit (as implied by the use of 45 for the syscall number), then you want the 32-bit register set:
mov eax, 45 ; brk
mov ebx, 0 ; arg 1: 0 = fail returning brk value in rax
int 0x80 ; syscall
mov dword ptr [brk_firstLocation], rax ; save result
mov eax, 45 ; brk
mov ebx, 4 ; arg 1: allocate 4 bytes for a 32-bit int
int 0x80 ; syscall
mov eax, dword ptr [brk_firstLocation] ; reload rax with allocated memory address.
mov dword ptr [eax], 42 ; use result: store 42 in newly allocated storage
Of course you can re-load the saved value for re-use as many times as needed.

I'm getting a segmentation fault in my assembly program [duplicate]

The tutorial I am following is for x86 and was written using 32-bit assembly, I'm trying to follow along while learning x64 assembly in the process. This has been going very well up until this lesson where I have the following simple program which simply tries to modify a single character in a string; it compiles fine but segfaults when ran.
section .text
global _start ; Declare global entry oint for ld
_start:
jmp short message ; Jump to where or message is at so we can do a call to push the address onto the stack
code:
xor rax, rax ; Clean up the registers
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
; Try to change the N to a space
pop rsi ; Get address from stack
mov al, 0x20 ; Load 0x20 into RAX
mov [rsi], al; Why segfault?
xor rax, rax; Clear again
; write(rdi, rsi, rdx) = write(file_descriptor, buffer, length)
mov al, 0x01 ; write the command for 64bit Syscall Write (0x01) into the lower 8 bits of RAX
mov rdi, rax ; First Paramter, RDI = 0x01 which is STDOUT, we move rax to ensure the upper 56 bits of RDI are zero
;pop rsi ; Second Parameter, RSI = Popped address of message from stack
mov dl, 25 ; Third Parameter, RDX = Length of message
syscall ; Call Write
; exit(rdi) = exit(return value)
xor rax, rax ; write returns # of bytes written in rax, need to clean it up again
add rax, 0x3C ; 64bit syscall exit is 0x3C
xor rdi, rdi ; Return value is in rdi (First parameter), zero it to return 0
syscall ; Call Exit
message:
call code ; Pushes the address of the string onto the stack
db 'AAAABBBNAAAAAAAABBBBBBBB',0x0A
This culprit is this line:
mov [rsi], al; Why segfault?
If I comment it out, then the program runs fine, outputting the message 'AAAABBBNAAAAAAAABBBBBBBB', why can't I modify the string?
The authors code is the following:
global _start
_start:
jmp short ender
starter:
pop ebx ;get the address of the string
xor eax, eax
mov al, 0x20
mov [ebx+7], al ;put a NULL where the N is in the string
mov al, 4 ;syscall write
mov bl, 1 ;stdout is 1
pop ecx ;get the address of the string from the stack
mov dl, 25 ;length of the string
int 0x80
xor eax, eax
mov al, 1 ;exit the shellcode
xor ebx,ebx
int 0x80
ender:
call starter
db 'AAAABBBNAAAAAAAABBBBBBBB'0x0A
And I've compiled that using:
nasm -f elf <infile> -o <outfile>
ld -m elf_i386 <infile> -o <outfile>
But even that causes a segfault, images on the page show it working properly and changing the N into a space, however I seem to be stuck in segfault land :( Google isn't really being helpful in this case, and so I turn to you stackoverflow, any pointers (no pun intended!) would be appreciated
I would assume it's because you're trying to access data that is in the .text section. Usually you're not allowed to write to code segment for security. Modifiable data should be in the .data section. (Or .bss if zero-initialized.)
For actual shellcode, where you don't want to use a separate section, see Segfault when writing to string allocated by db [assembly] for alternate workarounds.
Also I would never suggest using the side effects of call pushing the address after it to the stack to get a pointer to data following it, except for shellcode.
This is a common trick in shellcode (which must be position-independent); 32-bit mode needs a call to get EIP somehow. The call must have a backwards displacement to avoid 00 bytes in the machine code, so putting the call somewhere that creates a "return" address you specifically want saves an add or lea.
Even in 64-bit code where RIP-relative addressing is possible, jmp / call / pop is about as compact as jumping over the string for a RIP-relative LEA with a negative displacement.
Outside of the shellcode / constrained-machine-code use case, it's a terrible idea and you should just lea reg, [rel buf] like a normal person with the data in .data and the code in .text. (Or read-only data in .rodata.) This way you're not trying execute code next to data, or put data next to code.
(Code-injection vulnerabilities that allow shellcode already imply the existence of a page with write and exec permission, but normal processes from modern toolchains don't have any W+X pages unless you do something to make that happen. W^X is a good security feature for this reason, so normal toolchain security features / defaults must be defeated to test shellcode.)

Resources