Endianness followed in NASM 64 bit programming - nasm

I have 2 situations :
I.
arr dq 1234567887654321H
mov rsi,arr
mov rbx,[rsi]
Now, as we know that rsi always points to 1 byte of location in memory and x86 follows little-endian. Does rsi points to 21H and then this 21 gets into rbx or the complete value in arr gets transfered to rbx ?
II.
tempbuff resb 16
arr resb 1234567887654321H
mov rbx,qword[arr]
mov rsi,tempbuff
mov [rsi],rbx
Above statements are taken from different sections and combined here so as to focus on important details.
Now, from the above statements, rbx stores the entire contents of arr.
rsi points to 1st memory location of tempbuff. Then does the mov [rsi],rbx
stores the entire content of rbx to tempbuff OR does it simply stores the lowest 1 byte of rbx(here 21) into the the location pointed by rsi(1 byte location) ?

For case 1:
mov rbx,[rsi]
Nasm implicitly resolves memory size pointed by right hand side ,[r64] based on left hand side target which is a r64. Hence rsi is 64bit address pointing to a 64bit [rsi] value which will be moved to a 64 bit rbx register.
This could be explicitly stated in Nasm as
mov rbx,qword [rsi]
It implies the statement in the question
"rsi always points to 1 byte of location in memory"
is incorrect. It would be valid for following instruction:
mov rbx,byte [rsi]
Where first encountered byte pointed by 64 bit rsi address will become rbx least significant byte.
For case 2:
mov [rsi],rbx
Whole 64 bit rbx value is moved into memory pointed by 64 bit rsi address.
From the chat discussion I can draw conclusion OP's real doubt was about little endiannes affecting registers and memory.
exampleQuadWord dq 0102030405060708H ; this is represented in memory as 0807060504030201
exampleQWordWithByte dq 1 ; this is represented in memory as 0100000000000000
In a simplistic example
mov rbx,2
mov rsi, exampleQWordWithByte
mov [rsi],rbx
; exampleQWordWithByte in memory is now 0200000000000000

Related

Print newline with as little code as possible with NASM

I'm learning a bit of assembly for fun and I am probably too green to know the right terminology and find the answer myself.
I want to print a newline at the end of my program.
Below works fine.
section .data
newline db 10
section .text
_end:
mov rax, 1
mov rdi, 1
mov rsi, newline
mov rdx, 1
syscall
mov rax, 60
mov rdi, 0
syscall
But I'm hoping to achieve the same result without defining the newline in .data. Is it possible to call sys_write directly with the byte you want, or must it always be done with a reference to some predefined data (which I assume is what mov rsi, newline is doing)?
In short, why can't I replace mov rsi, newline by mov rsi, 10?
You always need the data in memory to copy it to a file-descriptor. There is no system-call equivalent of C stdio fputc that takes data by value instead of by pointer.
mov rsi, newline puts a pointer into a register (with a huge mov r64, imm64 instruction). sys_write doesn't special-case size=1 and treat its void *buf arg as a char value if it's not a valid pointer.
There aren't any other system calls that would do the trick. pwrite and writev are both more complicated (taking a file offset as well as a pointer, or taking an array of pointer+length to gather the data in kernel space).
There is a lot you can do to optimize this for code-size, though. See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code
First, putting the newline character in static storage means you need to generate a static address in a register. Your options here are:
5-bytes mov esi, imm32 (only in Linux non-PIE executables, so static addresses are link-time constants and are known to be in the low 2GiB of virtual address space and thus work as 32-bit zero-extended or sign-extended)
7-byte lea rsi, [rel newline] Works everywhere, the only good option if you can't use the 5-byte mov-immediate.
10-byte mov rsi, imm64. This works even in PIE executables (e.g. if you link with gcc -nostdlib without -static, on a distro where PIE is the default.) But only via a runtime relocation fixup, and the code-size is terrible. Compilers never use this because it's not faster than LEA.
But like I said, we can avoid static addressing entirely: Use push to put immediate data on the stack. This works even if we need zero-terminated strings, because push imm8 and push imm32 both sign-extend the immediate to 64-bit. Since ASCII uses the low half of the 0..255 range, this is equivalent to zero-extension.
Then we just need to copy RSP to RSI, because push leave RSP pointing to the data that was pushed. mov rsi, rsp would be 3 bytes because it needs a REX prefix. If you were targeting 32-bit code or the x32 ABI (32-bit pointers in long mode) you could use 2-byte mov esi, esp. But Linux puts the stack pointer at top of user virtual address space, so on x86-64 that's 0x007ff..., right at the top of the low canonical range. So truncating a pointer to stack memory to 32 bits isn't an option; we'd get -EFAULT.
But we can copy a 64-bit register with 1-byte push + 1-byte pop. (Assuming neither register needs a REX prefix to access.)
default rel ; We don't use any explicit addressing modes, but no reason to leave this out.
_start:
push 10 ; \n
push rsp
pop rsi ; 2 bytes total vs. 3 for mov rsi,rsp
push 1 ; _NR_write call number
pop rax ; 3 bytes, vs. 5 for mov edi, 1
mov edx, eax ; length = call number by coincidence
mov edi, eax ; fd = length = call number also coincidence
syscall ; write(1, "\n", 1)
mov al, 60 ; assuming write didn't return -errno, replace the low byte and keep the high zeros
;xor edi, edi ; leave rdi = 1 from write
syscall ; _exit(1)
.size: db $ - _start
xor-zeroing is the most well-known x86 peephole optimization: it saves 3 bytes of code size, and is actually more efficient than mov edi, 0. But you only asked for the smallest code to print a newline, without specifying that it had to exit with status = 0. So we can save 2 bytes by leaving that out.
Since we're just making an _exit system call, we don't need to clean up the stack from the 10 we pushed.
BTW, this will crash if the write returns an error. (e.g. redirected to /dev/full, or closed with ./newline >&-, or whatever other condition.) That would leave RAX=-something, so mov al, 60 would give us RAX=0xffff...3c. Then we'd get -ENOSYS from the invalid call number, and fall off the end of _start and decode whatever is next as instructions. (Probably zero bytes which decode with [rax] as an addressing mode. Then we'd fault with a SIGSEGV.)
objdump -d -Mintel disassembly of that code, after building with nasm -felf64 and linking with ld
0000000000401000 <_start>:
401000: 6a 0a push 0xa
401002: 54 push rsp
401003: 5e pop rsi
401004: 6a 01 push 0x1
401006: 58 pop rax
401007: 89 c2 mov edx,eax
401009: 89 c7 mov edi,eax
40100b: 0f 05 syscall
40100d: b0 3c mov al,0x3c
40100f: 0f 05 syscall
0000000000401011 <_start.size>:
401011: 11 .byte 0x11
So the total code-size is 0x11 = 17 bytes. vs. your version with 39 bytes of code + 1 byte of static data. Your first 3 mov instructions alone are 5, 5, and 10 bytes long. (Or 7 bytes long for mov rax,1 if you use YASM which doesn't optimize it to mov eax,1).
Running it:
$ strace ./newline
execve("./newline", ["./newline"], 0x7ffd4e98d3f0 /* 54 vars */) = 0
write(1, "\n", 1
) = 1
exit(1) = ?
+++ exited with 1 +++
If this was part of a larger program:
If you already have a pointer to some nearby static data in a register, you could do something like a 4-byte lea rsi, [rdx + newline-foo] (REX.W + opcode + modrm + disp8), assuming the newline-foo offset fits in a sign-extended disp8 and that RDX holds the address of foo.
Then you can have newline: db 10 in static storage after all. (Put it .rodata or .data, depending on which section you already had a pointer to).
It expects an address of the string in rsi register. Not a character or string.
mov rsi, newline loads the address of newline into rsi.

Writing to allocated memory with sbrk results in segfault [duplicate]

I am trying to dynamically allocate memory into the heap and then assign values in those memory addresses. I understand how to allocate the memory but how would I assign for example the value in a register to that first dynamic memory address?
This is what I have so far:
push rbp
mov rbp, rsp ;initialize an empy stack to create activation records for the rest of the subroutines
mov rax, 0x2d ;linux system call for brk()
mov rbx, 0x0 ;to get the adress of the first adress we are allocating we must have 0 in rbx
int 0x80 ;calls the linux operating system kernel for assistance
mov [brk_firstLocation], rax ;the first position in the heap will be returned in rax thus i save the first loaction in a varable called brk_firstLocation
mov rbx, rax ;the memory adress of the start of the heap is moved in rbx
add rbx, 0x14 ;we want 5 bytes worth of data alocated in the heap, so the start adress plus 20 bits
mov rax, 0x2d ;linux system call for brk()
int 0x80 ;calls the linux operating system kernel for assistance
What would I do, for example, to mov the value in rax into brk_firstLocation
others have pointed out a few things that are wrong with your code. I would like to add that you would not add 20 bits to the current breakpoint (or 20 bytes like add rbx, 20 actually does), you would simply add 5 bytes.
Also, your first syscall argument will not be in rbx, it will be in rdi. The 64-bit syscall ABI uses different system call numbers, different registers, and a different instruction (syscall instead of int 0x80) than the 32-bit ABI (which is still available in 64-bit processes). See also the x86 tag wiki for more ABI links.
Here's how your code should look:
push rbp
mov rbp, rsp
;; sys_brk(0)
mov rax, 12 ; 12 is SYS_brk (/usr/include/asm/unistd_64.h)
mov rdi, 0 ; rdi for first syscall arg in the 64-bit ABI, not rbx
syscall ; syscall, not int 0x80, for the 64-bit ABI
mov qword [brk_firstLocation], rax
;; sys_brk(old_break + 5)
lea rdi, [rax + 5] ; add 5 bytes to the break point
mov rax, 12
syscall ; set the new breakpoint
At this point you can use brk_firstLocation as a pointer to whatever 5 byte struct you want to store on the heap. Here's how you would put values in that memory space:
mov rdi, [brk_firstLocation] ; load the pointer from memory, if you didn't already have it in a register
mov byte [rdi], 'A' ; a char in the first byte
mov [rdi+1], ecx ; a 32-bit value in the last 4 bytes.
int 80h is only for 32 bit system calls. Use syscall for 64 bit instead.
Calling sys_brk twice is redundant - in assembly you always know where your program data ends. Simply put a label there and you will have the address.
Allocating this way memory less than one page is pointless, it will be allocated in blocks of 4KB anyway.
It is important to be understood - sys_brk is not heap management function. It is low level memory management.
There are a number of problems I see:
0x2d is the brk system call on x86 (32 bit); on x86_64 it's 0xc
brk sets the end of the data segment; it returns 0 on success and -1 on failure. It does not return "the first position in the heap". That comes from the symbol _end which the linker sets to the end of the uninitialized preallocated data.
So you want something like:
mov [brk_firstloaction], _end
mov rbx, [brk_firstlocation]
add rbx, 0x14 ; space for 5 dwords (20 bytes)
mov rax, 12
int 0x80
As you have done, call once to retrieve the current bottom of heap, then move the top of heap (which is the brk value). However your code is not correct in using the 64-bit register set r.x. If your Linux is 32-bit (as implied by the use of 45 for the syscall number), then you want the 32-bit register set:
mov eax, 45 ; brk
mov ebx, 0 ; arg 1: 0 = fail returning brk value in rax
int 0x80 ; syscall
mov dword ptr [brk_firstLocation], rax ; save result
mov eax, 45 ; brk
mov ebx, 4 ; arg 1: allocate 4 bytes for a 32-bit int
int 0x80 ; syscall
mov eax, dword ptr [brk_firstLocation] ; reload rax with allocated memory address.
mov dword ptr [eax], 42 ; use result: store 42 in newly allocated storage
Of course you can re-load the saved value for re-use as many times as needed.

Getting confused about usage of labels in assembly for X86_64 linux: Why should we write mov [digit], al, but not mov digit, al?

Here's my code:
section .data
digit db 0,10
section .text
global _start
_start:
call _printRAXDigit
mov rax, 60
mov rdx, 0
syscall
_printRAXDigit:
add rax, 48
mov [digit], al
mov rax, 1
mov rdi, 1
mov rsi, digit
mov rdx, 2
syscall
ret
I have a question about the difference between [digit] and digit.
I have learned that labels (like digit in the code), represent the memory address of the data, and the operator "[]" acts like something to dereference the pointer, so it will load the value that the label points at to the destination.
For instance, mov rax, [digit] will throw 0 to the rax register because digit points at the first element of the data (in this case, the integer 0).
However, in my code, it works when I write mov [digit], al, which means "load the value stored in al to the memory address digit", but I have no idea why we should use "[]" in this case. The first argument of mov must be a destination (like a register or a memory address), so I think it should be mov digit, al rather than mov [digit], al. It doesn't make sense to me why we use a value to get the value from another place rather than use a memory address to get the value.
So that's all of my question. Please give me any response about where my thinking is wrong or any correction about my concept of labels.
In NASM syntax (there are assemblers which use different notation, e.g. MASM/TASM use a different flavor of Intel syntax, and gas uses AT&T syntax) the following x86 instructions ...
mov esi, someAddress
mov esi, [someAddress]
mov [someAddress], esi
mov someAddress, esi ; see below
... (would) have the following meaning:
mov esi, someAddress
Write the number that represents the address where someAddress is stored to the register esi. So if someAddress is stored at address 1234 the value 1234 is written to esi.
mov esi, [someAddress]
Write the content of the memory to esi. So if someAddress is stored at address 1234 and the value stored at address 1234 is 5678 the value 5678 is written to esi.
You might also say: The value of the variable someAddress (a variable normally is nothing but the content of the memory at a certain address) is written to the esi register.
mov [someAddress], esi
Write the content of esi to the memory at address someAddress.
You might also say: Write the value of esi to the variable someAddress.
mov someAddress, esi
Would mean: Change the constant number which represents the address someAddress to esi.
So if someAddress is located at address 1234 and esi contains the value 5678 the instruction would mean:
Change the mathematical constant 1234 in a way that 1234 = 5678 after that change.
This is of course stupid because the mathematical constants 1234 and 5678 will never be equal. For this reason the x86 CPU has no such instruction.
(There are CPUs having similar instructions. On the SPARC CPUs for example instructions assigning a value to the zero register (which means: "assign a value to the constant zero") are used if you only want to have the instruction's side effects - like setting the flags - but you are not interested in the result itself.)

Assembly Language nasm error

I have written the following assembly code as prescribed by my text book in the intel 64 bit syntax
Section .text
global _short
_start:
jmp short Gotocall
shellcode:
pop rcx
xor eax,eax
mov byte [rcx+8], al
lea rdi, [rax]
mov long [rcx+8], rdi
mov long [rcx+12], eax
mov byte al, 0x3b
mov rsi, rax
lea rdi, [esi+8]
lea edx, [esi+12]
int 0x80
Gotocall:
call shellcode
db '/bin/shJAAAAKKKK'
but i get a nasm error in line 10 like this
asmshell.asm:10: error: mismatch in operand sizes
Can anybody tell me what mistake is their in my code.
And can anybody please tell me some good references to the 64 bit intel assembly instructions.
If you mean the error is on line 10
mov long [rcx+8], rdi
I was about to ask you what size long qualifier is, but the next line
mov long [rcx+12], eax
shows that you are moving two different sizes of register to the same size destination. In the first case the 64-bit register rdi, in the second case the 32-bit register eax, and long cannot satisfy them both.
Why not just drop the long since by specifying the register, the assembler knows the size of the destination? But sadly, you have only allowed 4 bytes memory to store a 64-bit register, given away by the [rcx+8] followed by [rcx+12].
Perhaps you intended
mov long [rcx+8], edi

Assembly x86 brk() call use

I am trying to dynamically allocate memory into the heap and then assign values in those memory addresses. I understand how to allocate the memory but how would I assign for example the value in a register to that first dynamic memory address?
This is what I have so far:
push rbp
mov rbp, rsp ;initialize an empy stack to create activation records for the rest of the subroutines
mov rax, 0x2d ;linux system call for brk()
mov rbx, 0x0 ;to get the adress of the first adress we are allocating we must have 0 in rbx
int 0x80 ;calls the linux operating system kernel for assistance
mov [brk_firstLocation], rax ;the first position in the heap will be returned in rax thus i save the first loaction in a varable called brk_firstLocation
mov rbx, rax ;the memory adress of the start of the heap is moved in rbx
add rbx, 0x14 ;we want 5 bytes worth of data alocated in the heap, so the start adress plus 20 bits
mov rax, 0x2d ;linux system call for brk()
int 0x80 ;calls the linux operating system kernel for assistance
What would I do, for example, to mov the value in rax into brk_firstLocation
others have pointed out a few things that are wrong with your code. I would like to add that you would not add 20 bits to the current breakpoint (or 20 bytes like add rbx, 20 actually does), you would simply add 5 bytes.
Also, your first syscall argument will not be in rbx, it will be in rdi. The 64-bit syscall ABI uses different system call numbers, different registers, and a different instruction (syscall instead of int 0x80) than the 32-bit ABI (which is still available in 64-bit processes). See also the x86 tag wiki for more ABI links.
Here's how your code should look:
push rbp
mov rbp, rsp
;; sys_brk(0)
mov rax, 12 ; 12 is SYS_brk (/usr/include/asm/unistd_64.h)
mov rdi, 0 ; rdi for first syscall arg in the 64-bit ABI, not rbx
syscall ; syscall, not int 0x80, for the 64-bit ABI
mov qword [brk_firstLocation], rax
;; sys_brk(old_break + 5)
lea rdi, [rax + 5] ; add 5 bytes to the break point
mov rax, 12
syscall ; set the new breakpoint
At this point you can use brk_firstLocation as a pointer to whatever 5 byte struct you want to store on the heap. Here's how you would put values in that memory space:
mov rdi, [brk_firstLocation] ; load the pointer from memory, if you didn't already have it in a register
mov byte [rdi], 'A' ; a char in the first byte
mov [rdi+1], ecx ; a 32-bit value in the last 4 bytes.
int 80h is only for 32 bit system calls. Use syscall for 64 bit instead.
Calling sys_brk twice is redundant - in assembly you always know where your program data ends. Simply put a label there and you will have the address.
Allocating this way memory less than one page is pointless, it will be allocated in blocks of 4KB anyway.
It is important to be understood - sys_brk is not heap management function. It is low level memory management.
There are a number of problems I see:
0x2d is the brk system call on x86 (32 bit); on x86_64 it's 0xc
brk sets the end of the data segment; it returns 0 on success and -1 on failure. It does not return "the first position in the heap". That comes from the symbol _end which the linker sets to the end of the uninitialized preallocated data.
So you want something like:
mov [brk_firstloaction], _end
mov rbx, [brk_firstlocation]
add rbx, 0x14 ; space for 5 dwords (20 bytes)
mov rax, 12
int 0x80
As you have done, call once to retrieve the current bottom of heap, then move the top of heap (which is the brk value). However your code is not correct in using the 64-bit register set r.x. If your Linux is 32-bit (as implied by the use of 45 for the syscall number), then you want the 32-bit register set:
mov eax, 45 ; brk
mov ebx, 0 ; arg 1: 0 = fail returning brk value in rax
int 0x80 ; syscall
mov dword ptr [brk_firstLocation], rax ; save result
mov eax, 45 ; brk
mov ebx, 4 ; arg 1: allocate 4 bytes for a 32-bit int
int 0x80 ; syscall
mov eax, dword ptr [brk_firstLocation] ; reload rax with allocated memory address.
mov dword ptr [eax], 42 ; use result: store 42 in newly allocated storage
Of course you can re-load the saved value for re-use as many times as needed.

Resources