Boot time program running on virtual computer without OS

Boot time program running on virtual computer without OS - nasm

For school assignment I have to write a program described below and I would really like some help on how to approach this problem. To be clear, I don't want you to solve this, I just want some guidance on how to do it.
Problem:
Write a boot time program, which will be run in a virtual computer without an operating system. The program has to print out your name and the words "ALT key is pressed" or "ALT key is not pressed" according to status of the ALT key.
Additional hints:
- the program has to be written in 16 bit mode
compiled program including its data must be less than 510 bytes in size
directive "org 0x7c00" specifies the correct address in the memory where the program is loaded
write instructions before the data
program should execute in an endless loop
there is no printf function, you will have to use interrupt 0x10
to read the state of the alt keys you can use the interrupt 0x16
to position the output of text use interrupt 0x10
binary format of the executable should be "bin" (nasm -f bin -o boot.bin code.asm)
resize the binary file to the size of a floppy disk (truncate -s 1474560 boot.bin )
mark the binary file as bootable disk: at location 0x1FE save the value 0x55 and at
location 0x1FF save value 0xAA (use hexadecimal editor, for example: ghex2)
start the virtual machine with your binary file as a floppy disk: (nice -n 19 qemu -fda boot.bin)

I suggest you read this on assembly bootloaders. Taken from that article, here is hello world -
org 7C00h
jmp short Start ;Jump over the data (the 'short' keyword makes the jmp instruction smaller)
Msg: db "Hello World! "
EndMsg:
Start: mov bx, 000Fh ;Page 0, colour attribute 15 (white) for the int 10 calls below
mov cx, 1 ;We will want to write 1 character
xor dx, dx ;Start at top left corner
mov ds, dx ;Ensure ds = 0 (to let us load the message)
cld ;Ensure direction flag is cleared (for LODSB)
Print: mov si, Msg ;Loads the address of the first byte of the message, 7C02h in this case
;PC BIOS Interrupt 10 Subfunction 2 - Set cursor position
;AH = 2
Char: mov ah, 2 ;BH = page, DH = row, DL = column
int 10h
lodsb ;Load a byte of the message into AL.
;Remember that DS is 0 and SI holds the
;offset of one of the bytes of the message.
;PC BIOS Interrupt 10 Subfunction 9 - Write character and colour
;AH = 9
mov ah, 9 ;BH = page, AL = character, BL = attribute, CX = character count
int 10h
inc dl ;Advance cursor
cmp dl, 80 ;Wrap around edge of screen if necessary
jne Skip
xor dl, dl
inc dh
cmp dh, 25 ;Wrap around bottom of screen if necessary
jne Skip
xor dh, dh
Skip: cmp si, EndMsg ;If we're not at end of message,
jne Char ;continue loading characters
jmp Print ;otherwise restart from the beginning of the message
times 0200h - 2 - ($ - $$) db 0 ;Zerofill up to 510 bytes
dw 0AA55h ;Boot Sector signature
;OPTIONAL:
;To zerofill up to the size of a standard 1.44MB, 3.5" floppy disk
;times 1474560 - ($ - $$) db 0

Related

NASM ASSEMBLY - Print "Hello World"

I've created a string and turned it into an array. Looping through each index and moving to the al register so it can print out to the vga. The problem is, it prints the size of the string with no problem, but the characters in gibberish. Can you please help me figure out what the problem is in the code. It will be highly appreciated.
org 0
bits 16
section .text
global _start
_start:
mov si, msg
loop:
inc si
mov ah, 0x0e
mov al, [si]
or al, al
jz end
mov bh, 0x00
int 0x10
jmp loop
end:
jmp .done
.done:
jmp $
msg db 'Hello, world!',0xa
len equ $ - msg
TIMES 510 - ($ - $$) db 0
DW 0xAA55
bootloader code
ORG 0x7c00
BITS 16
boot:
mov ah, 0x02
mov al, 0x01
mov ch, 0x00
mov cl, 0x02
mov dh, 0x00
mov dl, 0x00
mov bx, 0x1000
mov es, bx
int 0x13
jmp 0x1000:0x00
times 510 - ($ - $$) db 0
dw 0xAA55

The bootloader
Before tackling the kernel code, let's look at the bootloader that brings the kernel in memory.
You have written a very minimalistic version of a bootloader, one that omits much of the usual stuff like setting up segment registers, but thanks to its reduced nature that's not really a problem.
What could be a problem is that you wrote mov dl, 0x00, hardcoding a zero to select the first floppy as your bootdisk. No problem if this is indeed the case, but it would be much better to just use whatever value the BIOS preloaded the DL register with. That's the ID for the disk that holds your bootloader and kernel.
What is a problem is that you load the kernel to the segmented address 0x1000:0x1000 and then later jump to the segmented address 0x1000:0x0000 which is 4096 bytes short of the kernel. You got lucky that the kernel code did run in the end, thanks to the memory between these two addresses most probably being filled with zero-bytes that (two by two) translate into the instruction add [bx+si], al. Because you omitted setting up the DS segment register, we don't know what unlucky byte got overwritten so many times. Let's hope it was not an important byte...
mov bx, 0x1000
mov es, bx
xor bx, bx <== You forgot to write this instruction!
int 0x13
jmp 0x1000:0x0000
What is a problem is that you ignore the possibility of encountering troubles when loading a sector from the disk. At the very least you should inspect the carry flag that the BIOS.ReadSector function 02h reports and if the flag is set you could abort cleanly. A more sophisticated approach would also retry a limited number of times, say 3 times.
ORG 0x7C00
BITS 16
; IN (dl)
mov dh, 0x00 ; DL is bootdrive
mov cx, 0x0002
mov bx, 0x1000
mov es, bx
xor bx, bx
mov ax, 0x0201 ; BIOS.ReadSector
int 0x13 ; -> AH CF
jc ERR
jmp 0x1000:0x0000
ERR:
cli
hlt
jmp ERR
times 510 - ($ - $$) db 0
dw 0xAA55
The kernel
After the jmp 0x1000:0x0000 instruction has brought you to the first instruction of your kernel, the CS code segment register holds the value 0x1000. None of the other segment registers did change, and since you did not setup any of them in the bootloader, we still don't know what any of them contain. However in order to retrieve the bytes from the message at msg with the mov al, [si] instruction, we need a correct value for the DS data segment register. In accordance with the ORG 0 directive, the correct value is the one we already have in CS. Just two 1-byte instructions are needed: push cs pop ds.
There's more to be said about the kernel code:
The printing loop uses a pre-increment on the pointer in the SI register. Because of this the first character of the string will not get displayed. You could compensate for this via mov si, msg - 1.
The printing loop processes a zero-terminating string. You don't need to prepare that len equate. What you do need is an explicit zero byte that terminates the string. You should not rely on that large number of zero bytes thattimes produced. In some future version of the code there might be no zero byte at all!
You (think you) have included a newline (0xa) in the string. For the BIOS.Teletype function 0Eh, this is merely a linefeed that moves down on the screen. To obtain a newline, you need to include both carriage return (13) and linefeed (10).
There's no reason for your kernel code to have the bootsector signature bytes at offset 510. Depending on how you get this code to the disk, it might be necessary to pad the code up to (a multiple of) 512, so keep times 512 - ($ - $$) db 0.
The kernel:
ORG 0
BITS 16
section .text
global _start
_start:
push cs
pop ds
mov si, msg
mov bx, 0x0007 ; DisplayPage=0, GraphicsColor=7 (White)
jmp BeginLoop
PrintLoop:
mov ah, 0x0E ; BIOS.Teletype
int 0x10
BeginLoop:
mov al, [si]
inc si
test al, al
jnz PrintLoop
cli
hlt
jmp $-2
msg db 'Hello, world!', 13, 10, 0
TIMES 512 - ($ - $$) db 0

Print newline with as little code as possible with NASM

I'm learning a bit of assembly for fun and I am probably too green to know the right terminology and find the answer myself.
I want to print a newline at the end of my program.
Below works fine.
section .data
newline db 10
section .text
_end:
mov rax, 1
mov rdi, 1
mov rsi, newline
mov rdx, 1
syscall
mov rax, 60
mov rdi, 0
syscall
But I'm hoping to achieve the same result without defining the newline in .data. Is it possible to call sys_write directly with the byte you want, or must it always be done with a reference to some predefined data (which I assume is what mov rsi, newline is doing)?
In short, why can't I replace mov rsi, newline by mov rsi, 10?

You always need the data in memory to copy it to a file-descriptor. There is no system-call equivalent of C stdio fputc that takes data by value instead of by pointer.
mov rsi, newline puts a pointer into a register (with a huge mov r64, imm64 instruction). sys_write doesn't special-case size=1 and treat its void *buf arg as a char value if it's not a valid pointer.
There aren't any other system calls that would do the trick. pwrite and writev are both more complicated (taking a file offset as well as a pointer, or taking an array of pointer+length to gather the data in kernel space).
There is a lot you can do to optimize this for code-size, though. See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code
First, putting the newline character in static storage means you need to generate a static address in a register. Your options here are:
5-bytes mov esi, imm32 (only in Linux non-PIE executables, so static addresses are link-time constants and are known to be in the low 2GiB of virtual address space and thus work as 32-bit zero-extended or sign-extended)
7-byte lea rsi, [rel newline] Works everywhere, the only good option if you can't use the 5-byte mov-immediate.
10-byte mov rsi, imm64. This works even in PIE executables (e.g. if you link with gcc -nostdlib without -static, on a distro where PIE is the default.) But only via a runtime relocation fixup, and the code-size is terrible. Compilers never use this because it's not faster than LEA.
But like I said, we can avoid static addressing entirely: Use push to put immediate data on the stack. This works even if we need zero-terminated strings, because push imm8 and push imm32 both sign-extend the immediate to 64-bit. Since ASCII uses the low half of the 0..255 range, this is equivalent to zero-extension.
Then we just need to copy RSP to RSI, because push leave RSP pointing to the data that was pushed. mov rsi, rsp would be 3 bytes because it needs a REX prefix. If you were targeting 32-bit code or the x32 ABI (32-bit pointers in long mode) you could use 2-byte mov esi, esp. But Linux puts the stack pointer at top of user virtual address space, so on x86-64 that's 0x007ff..., right at the top of the low canonical range. So truncating a pointer to stack memory to 32 bits isn't an option; we'd get -EFAULT.
But we can copy a 64-bit register with 1-byte push + 1-byte pop. (Assuming neither register needs a REX prefix to access.)
default rel ; We don't use any explicit addressing modes, but no reason to leave this out.
_start:
push 10 ; \n
push rsp
pop rsi ; 2 bytes total vs. 3 for mov rsi,rsp
push 1 ; _NR_write call number
pop rax ; 3 bytes, vs. 5 for mov edi, 1
mov edx, eax ; length = call number by coincidence
mov edi, eax ; fd = length = call number also coincidence
syscall ; write(1, "\n", 1)
mov al, 60 ; assuming write didn't return -errno, replace the low byte and keep the high zeros
;xor edi, edi ; leave rdi = 1 from write
syscall ; _exit(1)
.size: db $ - _start
xor-zeroing is the most well-known x86 peephole optimization: it saves 3 bytes of code size, and is actually more efficient than mov edi, 0. But you only asked for the smallest code to print a newline, without specifying that it had to exit with status = 0. So we can save 2 bytes by leaving that out.
Since we're just making an _exit system call, we don't need to clean up the stack from the 10 we pushed.
BTW, this will crash if the write returns an error. (e.g. redirected to /dev/full, or closed with ./newline >&-, or whatever other condition.) That would leave RAX=-something, so mov al, 60 would give us RAX=0xffff...3c. Then we'd get -ENOSYS from the invalid call number, and fall off the end of _start and decode whatever is next as instructions. (Probably zero bytes which decode with [rax] as an addressing mode. Then we'd fault with a SIGSEGV.)
objdump -d -Mintel disassembly of that code, after building with nasm -felf64 and linking with ld
0000000000401000 <_start>:
401000: 6a 0a push 0xa
401002: 54 push rsp
401003: 5e pop rsi
401004: 6a 01 push 0x1
401006: 58 pop rax
401007: 89 c2 mov edx,eax
401009: 89 c7 mov edi,eax
40100b: 0f 05 syscall
40100d: b0 3c mov al,0x3c
40100f: 0f 05 syscall
0000000000401011 <_start.size>:
401011: 11 .byte 0x11
So the total code-size is 0x11 = 17 bytes. vs. your version with 39 bytes of code + 1 byte of static data. Your first 3 mov instructions alone are 5, 5, and 10 bytes long. (Or 7 bytes long for mov rax,1 if you use YASM which doesn't optimize it to mov eax,1).
Running it:
$ strace ./newline
execve("./newline", ["./newline"], 0x7ffd4e98d3f0 /* 54 vars */) = 0
write(1, "\n", 1
) = 1
exit(1) = ?
+++ exited with 1 +++
If this was part of a larger program:
If you already have a pointer to some nearby static data in a register, you could do something like a 4-byte lea rsi, [rdx + newline-foo] (REX.W + opcode + modrm + disp8), assuming the newline-foo offset fits in a sign-extended disp8 and that RDX holds the address of foo.
Then you can have newline: db 10 in static storage after all. (Put it .rodata or .data, depending on which section you already had a pointer to).

It expects an address of the string in rsi register. Not a character or string.
mov rsi, newline loads the address of newline into rsi.

How do i reverse a string on emu8086 assembly language [duplicate]

I have to do a simple calculator in assembly using EMU8086, but every time I try to launch it EMU8086 gives this error:
INT 21h, AH=09h -
address: 170B5
byte 24h not found after 2000 bytes.
; correct example of INT 21h/9h:
mov dx, offset msg
mov ah, 9
int 21h
ret
msg db "Hello$"
I checked the other stuff, but there were no mistakes:
data segment
choice db ?
snum1 db 4 dup(?)
snum2 db 4 dup(?)
sres db 4 dup(?)
num1 db ?
num2 db ?
res db ?
;;menu1 db "Chose a function to procced", 10, 13, "Add [+]", 10, 13, "Sub [-]", 10, 13
;;menu2 db "Mul [*]", 10, 13, "Div [/]", 10, 13, "Mod [%]", 10, 13, "Pow [^]", 10, 13, "Exit [x]$"
messStr db "Enter Your Choice:",10,13,"",10,13,"Add --> +",10,13,"Sub --> -",10,13,"Mul --> *",10,13,"Div --> /",10,13,"Mod --> %",10,13,"Pow --> ^",10,13,"Exit --> X",10,13,"$"
msg1 db "Enter first number$"
msg2 db "Enter second number$"
msg3 db "Press any key to procced$"
msg4 db "The result is $"
ends
stack segment
dw 128 dup(0)
ends
code segment
assume cs:code, ds:data, ss:stack
newline proc ;; new line
push ax
push dx
mov ah, 2
mov DL, 10
int 21h
mov ah, 2
mov DL, 13
int 21h
pop dx
pop ax
ret
endp
printstr proc ;; print string
push BP
mov BP, SP
push dx
push ax
mov dx, [BP+4]
mov ah, 9
int 21h
pop ax
pop dx
pop BP
ret 2
endp
inputstr proc ;; collect input
push BP
mov BP, SP
push bx
push ax
mov bx, [BP+4]
k1:
mov ah, 1
int 21h
cmp al, 13
je sofk
mov [bx], al
inc bx
jmp k1
sofk:
mov byte ptr [bx], '$'
pop ax
pop bx
pop BP
ret 2
endp
getNums proc ;; get the numbers
call newline
push offset msg1
call printstr
call newline
push offset snum1
call inputstr
call newline
push offset msg2
call printstr
call newline
push offset snum2
call inputstr
ret
endp
start:
mov ax, data
mov ds, ax
mov ax, stack
mov ss, ax
;; print the main menu
call newline
push offset msg4
call printstr
;; collect the input
call newline
mov bx, offset choice
mov ah, 1
int 21h
mov [bx], al
;; check it
mov al, choice
cmp al, '+'
jne cexit
call getNums
jmp cont
cexit:
cmp al, 'x'
je cend
cont:
;; pause before going to the main menu
call newline
push offset msg3
call printstr
mov bx, offset choice
mov ah, 1
int 21h
call newline
call newline
call newline
jmp start
cend:
mov ax, 4c00h
int 21h
ends
end start
I cut most of the code segment because it wasn't important here.
After experimenting with the code I found that the problem was related to the lengths of the messages in the data segment. menu1 & menu2 were too long and any message after them can't be printed (msg1 & msg2 are printed, but nothing after them). I checked if I should merge menu1 & menu2, but it didn't help out. Please help me find out what is wrong with it.

The error message means you use int 21h / AH=09h on a string that didn't end with a $ (ASCII 24h). The system-call handler checked 2000 bytes without finding one.
Often, that means your code or data is buggy, e.g. in a fixed string you forgot a $ at the end, or if copying bytes into a buffer then you maybe overwrote or never stored a '$' in the first place.
But in this case, it appears that EMU8086 has a bug assembling push offset msg4. (In a way that truncates the 00B5h 16-bit address to 8-bit, and sign-extends back to 16, creating a wrong pointer that points past where any $ characters are in your data.)
Based on the error message below I know you are using EMU8086 as your development environment.
INT 21h, AH=09h -
address: 170B5
byte 24h not found after 2000 bytes.
; correct example of INT 21h/9h:
mov dx, offset msg
mov ah, 9
int 21h
ret
msg db "Hello$"
I'm no expert on EMU8086 by any stretch of the imagination. I do know why your offsets don't work. I can't tell you if there is a proper way to resolve this, or if it's an EMU8086 bug. Someone with a better background on this emulator would know.
You have created a data segment with some variables. It seems okay to me (but I may be missing something). I decided to load up EMU8086 to actually try this code. It assembled without error. Using the debugger I single stepped to the push offset msg1 line near the beginning of the program. I knew right away from the instruction encoding what was going on. This is the decoded instruction I saw:
It shows the instruction was encoded as push 0b5h where 0b5h is the offset. The trouble is that it is encoded as a push imm8 . The two highlighted bytes on the left hand pane show it was encoded with these bytes:
6A B5
If you review an instruction set reference you'll find the encodings for PUSH instruction encoded with 6A is listed as:
Opcode* Instruction Op/En 64-Bit Mode Compat/Leg Mode Description
6A ib PUSH imm8 I Valid Valid Push imm8.
You may say that B5 fits within a byte (imm8) so what is the problem? The smallest value that can be pushed onto the stack with push in 16-bit mode is a 16-bit word. Since a byte is smaller than a word, the processor takes the byte and sign extends it to make a 16-bit value. The instruction set reference actually says this:
If the source operand is an immediate of size less than the operand size, a sign-extended value is pushed on the stack
B5 is binary 10110101 . The sign bit is the left most bit. Since it is 1 the upper 8 bits placed onto the stack will be 11111111b (FF). If the sign bit is 0 then then 00000000b is placed in the upper 8 bits. The emulator didn't place 00B5 onto the stack, it placed FFB5. That is incorrect! This can be confirmed if I step through the push 0b5h instruction and review the stack. This is what I saw:
Observe that the value placed on the stack is FFB5. I could not find an appropriate syntax (even using the word modifier) to force EMU8086 to encode this as push imm16. A push imm16 would be able to encode the entire word as push 00b5 which would work.
Two things you can do. You can place 256 bytes of dummy data in your data segment like this:
data segment
db 256 dup(?)
choice db ?
... rest of data
Why does this work? Every variable defined after the dummy data will be an offset that can't be represented in a single byte. Because of this EMU8086 is forced to encode push offset msg1 as a word push.
The cleaner solution is to use the LEA instruction. This is the load effective address instruction. It takes a memory operand and computes the address (in this case the offset relative to the data segment). You can replace all your code that uses offset with something like:
lea ax, [msg1]
push ax
AX can be any of the general purpose 16-bit registers. Once in a register, push the 16-bit register onto the stack.
Someone may have a better solution for this, or know a way to resolve this. If so please feel free to comment.
Given the information above, you may ask why did it seem to work when you moved the data around? The reason is that the way you reorganized all the strings (placing the long one last) caused all the variables to start with offsets that were less than < 128. Because of this the PUSH of an 8-bit immediate offset sign extended a 0 in the top bits when placed on the stack. The offsets would be correct. Once the offsets are >= 128 (and < 256) the sign bit is 1 and the value placed on the stack sign will have an upper 8 bits of 1 rather than 0.
There are other bugs in your program, I'm concentrating on the issue directly related to the error you are receiving.

I reviewed your code and concentrated on the following sequence of instructions:
mov bx, offset choice ; here you set BX to the address of 'choice'
mov ah, 1
int 21h ; here you 'READ CHARACTER FROM STANDARD INPUT, WITH ECHO'
mov [bx], al ; because INT 21h does preserve BX, you are writing back the result of the interrupt call (AL) back to the memory location at BX, which is named 'choice'
;; check it
mov al, choice ; HERE you are moving a BYTE variable named 'choice' to AL, overwriting the result of the last INT 21h call
cmp al, '+' ; ... and compare this variable to the ASCII value of '+'
jne cexit ; if this variable is unequal to '+' you jump to 'cexit'
call getNums ; otherwise you try to get another number from the input/STANDARD CONSOLE
So your sequence
mov bx, offset choice ; here you set BX to the address of 'choice'
...
mov [bx], al ; because INT 21h does preserve BX, you ...
...
mov al, choice
essentially means, that you are setting BX to the address of 'choice', then setting 'choice'([BX]) to AL and copying it back to AL.
This is redundant.
After that, you compare that char to '+' and...
if that char equals to '+', you get the next char with call getNums and then continue with cont:.
if that char does not equal to '+', you compare it to 'x', the exit-char. If it's not 'x', you fall through to cont:
No error here.
So your problem with menu1 and menu2 may stem from some escape characters included in your strings like %,/,\. For example, % is a MACRO character in some assemblers which may create problems.

simple solution is that your strings should always end in '$'
change DUP(?) to DUP('$') and all other strings end with ,'$'

NASM: How to create/handle basic bmp file using intel 64 bit assembly?

How do I create/handle simple bmp file filling it with one color only using intel 64 bit assembly and nasm assembler?

The steps that include such operation are:
Create bmp file header with fixed values (explanation of specific fields below)
Create buffer which contains enough space - three bytes per pixel (one color = red + green + blue)
Open/create file
Fill the buffer
Write header to file
Write buffer to file
Close file
Exit program
Ad. 2: This is a bit more tricky - if the number of pixels per row is not divisible by 4 the program has to fill lacking bytes with 0xFF. Here I purpousely created a picture 201x201. On this example we can see that we will have 3*201=603 bytes per row meaning that we will need additional byte per row. Because of this the size required for picture buffer is 604*201=121404.
The source code that answers questions:
section .text
global _start ;must be declared for linker (ld)
_start: ;tell linker entry point
;#######################################################################
;### This program creates empty bmp file - 64 bit version ##############
;#######################################################################
;### main ##############################################################
;#######################################################################
; open file
mov rax,85 ;system call number - open/create file
mov rdi,msg ;file name
;flags
mov rsi,111111111b ;mode
syscall ;call kernel
; save file descriptor
mov r8, rax
; write headline to file
mov rax, 1 ;system call number - write
mov rdi, r8 ;load file desc
mov rsi, bmpheadline ;load adress of buffer to write
mov rdx, 54 ;load number of bytes
syscall ;call kernel
mov rbx, 201 ;LOOPY counter
mov rdx, empty_space ;load address of buffer (space allocated for picture pixels)
LOOPY:
mov rcx, 201 ;LOOPX counter
LOOPX:
mov byte [rdx+0], 0x00 ;BLUE
mov byte [rdx+1], 0xFF ;GREEN
mov byte [rdx+2], 0xFF ;RED
dec rcx ;decrease counter_x
add rdx, 3 ;move address pointer by 3 bytes (1 pixel = 3 bytes, which we just have written)
cmp rcx, 0 ;check if counter is 0
jne LOOPX ;if not jump to LOOPX
dec rbx ;decrease counter_y
mov byte [rdx], 0xFF ;additional byte per row
inc rdx ;increase address
cmp rbx, 0 ;check if counter is 0
jne LOOPY ;if not jump to LOOPY
; write content to file
mov rax, 1 ;system call number - write
mov rdi, r8 ;load file desc
mov rsi, empty_space ;load adress of buffer to write
mov rdx, 121404 ;load number of bytes
syscall ;call kernel
; close file
mov rax, 3 ;system call number - close
mov rdi, r8 ;load file desc
syscall ;call kernel
; exit program
mov rax,60 ;system call number - exit
syscall ;call kernel
section .data
msg db 'filename.bmp',0x00 ;name of out file, 0x00 = end of string
bmpheadline db 0x42,0x4D,0x72,0xDA,0x01,0x00,0x00,0x00,0x00,0x00,0x36,0x00,0x00,0x00,0x28,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0xC9,0x00,0x00,0x00,0x01,0x00,0x18,0x00,0x00,0x00,0x00,0x00,0x3C,0xDA,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00
section .bss ;this section is responsible for preallocated block of memory of fixed size
empty_space: resb 121404 ;preallocation of 121404 bytes
Here the explaination of bmp headline (more under this link: http://www.dragonwins.com/domains/getteched/bmp/bmpfileformat.htm )
;### File Header - 14 bytes
;#######################################################################
;### bfType, 2 bytes, The characters "BM"
;### 0x42,0x4D = "B","M"
;###
;### bfSize, 4 bytes, The size of the file in bytes
;### 0x72,0xDA,0x01,0x00 => 0x00,0x01,0xDA,0x72 = 0x1DA72 = 121458 bytes
;### 121458 = 54 + 201 * (201 + 1) * 3
;###
;### Comment:
;### We want to create file 201x201, that means 201 rows and 201 columns
;### meaning each row will take 201*3 = 603 bytes
;###
;### According to BMP file specification each such row must be adjusted
;### so its size is dividable by 4, this gives us plus 1 byte for each
;### row.
;###
;###
;### bfReserved1, 2 bytes, Unused - must be zero
;### 0x00,0x00
;###
;### bfReserved2, 2 bytes, Unused - must be zero
;### 0x00,0x00
;###
;### bfOffBits, 4 bytes, Offset to start of Pixel Data
;### 0x36,0x00,0x00,0x00 = 54 bytes
;###
;### Image Header - 40 bytes
;#######################################################################
;### biSize 4 Header Size - Must be at least 40
;### 0x28,0x00,0x00,0x00 = 40
;###
;### biWidth 4 Image width in pixels
;### 0xC9,0x00,0x00,0x00 = 201
;###
;### biHeight 4 Image height in pixels
;### 0xC9,0x00,0x00,0x00 = 201
;###
;### biPlanes 2 Must be 1
;### 0x01,0x00
;###
;### biBitCount 2 Bits per pixel - 1, 4, 8, 16, 24, or 32
;### 0x18,0x00 = 24
;###
;### biCompression 4 Compression type (0 = uncompressed)
;### 0x00,0x00,0x00,0x00
;###
;### biSizeImage 4 Image Size - may be zero for uncompressed images
;### 0x3C,0xDA,0x01,0x00 => 0x00,0x01,0xDA,0x3C = 121404 bytes
;###
;### biXPelsPerMeter 4 Preferred resolution in pixels per meter
;### 0x00,0x00,0x00,0x00
;###
;### biYPelsPerMeter 4 Preferred resolution in pixels per meter
;### 0x00,0x00,0x00,0x00
;###
;### biClrUsed 4 Number Color Map entries that are actually used
;### 0x00,0x00,0x00,0x00
;###
;### biClrImportant 4 Number of significant colors
;### 0x00,0x00,0x00,0x00
;###

Here's an improved version of rbraun's answer. This should really be a Q&A over on codereview.SE >.<
I decided to post a separate answer instead of an edit, but feel free to copy any of this back into that answer if you want. I've tested this for a few different row/column sizes, and it works.
I improved the comments, as well as optimizing a bit. Comments like "call kernel" are too obvious to bother writing; that's just noise. I changed the comments on the system calls to more clearly say what was going on. e.g. it looks like you're calling sys_open, but you're actually using sys_creat. That means there is no flags arg, even though you mention it in a comment.
I also parameterized the BMP header and loops it so it works for any assemble-time value of BMPcols and BMProws with no extra overhead at run-time. If the row width is a multiple of 4B without padding, it leaves out the store and increment instructions altogether.
For very large buffers, it would make a lot of sense to use multiple write() calls on a buffer that ends at the end of a line, so you can reuse it. e.g. any multiple of lcm(4096, row_bytes) would be good, since it holds a whole number of rows. Around 128kiB is maybe a good size, because L2 cache size in Intel CPUs since Nehalem is 256kiB, so the data can hopefully stay hot in L2 while the kernel memcpys it into the pagecache repeatedly. You definitely want the buffer to be significantly smaller than last-level cache size.
Changes from original:
Fixed file-creation mode: don't set the execute bits, just read/write. Use octal like a normal person.
Improve comments, as discussed above: be more explicit about what system calls we're making. Avoid re-stating what's already clear from the asm instructions.
Demonstrate RIP-relative addressing for static objects
Put static constant data in .rodata. We don't need a .data section/segment at all.
Used 32-bit operand size where possible, especially for putting small constants in registers. (And note that mov-immediate is not really a "load").
Improved loop idiom: dec / jnz with no separate CMP.
Parameterized on BMProws / BMPcols, and defined assemble-time constants for various sizes instead of hard-coding. The assembler can do the math for you, so take advantage of it.
Define the BMP header with separately named dd items, instead of a no-longer-meaningful block of bytes with db.
Make only one write() system call: copy the BMP header into the buffer first. A 54 byte memcpy is much faster than an extra syscall.
Save some instructions by not repeating the setup of args for system calls when they're already there.
Merged the three byte stores for pixel components into one dword store. These stores overlap, but that's fine.
DEFAULT REL ; default to RIP-relative addressing for static data
;#######################################################################
;### This program creates empty bmp file - 64 bit version ##############
section .rodata ; read-only data is the right place for these, not .data
BMPcols equ 2019
BMProws equ 2011
; 3 bytes per pixel, with each row padded to a multiple of 4B
BMPpixbytes equ 3 * BMProws * ((BMPcols + 3) & ~0x3)
;; TODO: rewrite this header with separate db and dd directives for the different fields. Preferably in terms of assembler-constant width and height
ALIGN 16 ; for efficient rep movs
bmpheader:
;; BMP is a little-endian format, so we can use dd and stuff directly instead of encoding the bytes ourselves
bfType: dw "BM"
bfSize: dd BMPpixbytes + bmpheader_len ; size of file in bytes
dd 0 ; reserved
bfOffBits: dd bmpheader_len ; yes we can refer to stuff that's defined later.
biSize: dd 40 ; header size, min = 40
biWidth: dd BMPcols
biHeight: dd BMProws
biPlanes: dw 1 ; must be 1
biBitCount: dw 24 ; bits per pixel: 1, 4, 8, 16, 24, or 32
biCompression: dd 0 ; uncompressed = 0
biSizeImage: dd BMPpixbytes ; Image Size - may be zero for uncompressed images
biXPelsPerMeter: dd 0 ; Preferred resolution in pixels per meter
biYPelsPerMeter: dd 0 ; Preferred resolution in pixels per meter
biClrUsed: dd 0 ; Number Color Map entries that are actually used
biClrImportant: dd 0 ; Number of significant colors
bmpheader_len equ $ - bmpheader ; Let the assembler calculate this for us. Should be 54. `.` is the current position
; output filename is hard-coded. Checking argc / argv is left as an exercise for the reader.
; Of course it would be even easier to be more Unixy and just always write to stdout, so the user could redirect
fname db 'filename.bmp',0x00 ;name of out file, 0x00 = end of string
section .bss ;this section is responsible for fixed size preallocated blocks
bmpbuf: resb 54 + BMPpixbytes ; static buffer big enough to hold the whole file (including header).
bmpbuf_len equ $ - bmpbuf
section .text
global _start ;make the symbol externally visible
_start: ;The linker looks for this symbol to set the entry point
;#######################################################################
;### main ##############################################################
; creat(fname, 0666)
mov eax,85 ; SYS_creat from /usr/include/x86_64-linux-gnu/asm/unistd_64.h
;mov edi, fname ;file name string. Static data is always in the low 2G, so you can use 32bit immediates.
lea rdi, [fname] ; file name, PIC version. We don't need [rel fname] since we used DEFAULT REL.
; Ubuntu 16.10 defaults to enabling position-independent executables that can use ASLR, but doesn't require it the way OS X does.)
;creat doesn't take flags. It's equivalent to open(path, O_CREAT|O_WRONLY|O_TRUNC, mode).
mov esi, 666o ;mode in octal, to be masked by the user's umask
syscall ; eax = fd or -ERRNO
test eax,eax ; error checking on system calls.
js .handle_error ; We don't print anything, so run under strace to see what happened.
;;; memcpy the BMP header to the start of our buffer.
;;; SSE loads/stores would probably be more efficient for such a small copy
mov edi, bmpbuf
mov esi, bmpheader
;Alternative: rep movsd or movsq may be faster.
;mov ecx, bmpheader_len/4 + 1 ; It's not a multiple of 4, but copy extra bytes because MOVSD is faster
mov ecx, bmpheader_len
rep movsb
; edi now points to the first byte after the header, where pixels should be stored
; mov edi, bmpbuffer+bmpheader_len might let out-of-order execution get started on the rest while rep movsb was still running, but IDK.
;######### main loop
mov ebx, BMProws
.LOOPY: ; do{
mov ecx, BMPcols ; Note the use of a macro later to decide whether we need padding at the end of each row or not, so arbitrary widths should work.
.LOOPX: ; do{
mov dword [rdi], (0xFF <<16) | (0xFF <<8) | 0x00 ;RED=FF, GREEN=FF, BLUE=00
; stores one extra byte, but we overlap it with the next store
add rdi, 3 ;move address pointer by 3 bytes (1 pixel = 3 bytes, which we just have written)
dec ecx
jne .LOOPX ; } while(--x != 0)
; end of inner loop
%if ((BMPcols * 3) % 4) != 0
; Pad the row to a multiple of 4B
mov dword [rdi], 0xFFFFFFFF ; might only need a byte or word store, but another dword store that we overlap is fine as long as it doesn't go off the end of the buffer
add rdi, 4 - (BMPcols * 3) % 4 ; advance to a 4B boundary
%endif
dec ebx
jne .LOOPY ; } while(--y != 0)
;##### Write out the buffer to the file
; fd is still where we left it in RAX.
; write and close calls both take it as the first arg,
; and the SYSCALL ABI only clobbers RAX, RCX, and R11, so we can just put it in EDI once.
mov edi, eax ; fd
; write content to file: write(fd, bmpbuf, bmpbuf_len)
mov eax, 1 ;SYS_write
lea rsi, [bmpbuf] ;buffer.
; We already have enough info in registers that reloading this stuff as immediate constants isn't necessary, but it's much more readable and probably at least as efficient anyway.
mov edx, bmpbuf_len
syscall
; close file
mov eax, 3 ;SYS_close
; fd is still in edi
syscall
.handle_error:
; exit program
mov rax,60 ;system call number - exit
syscall
I used RIP-relative LEA sometimes, and absolute addressing (mov r32, imm32) sometimes for the static data. This is silly; really I should have just picked one and used it everywhere. (And if I picked absolute non-PIC so I know the address is definitely in the low 31 bits of virtual address space, take advantage of that everywhere with stuff like add edi,3 instead of RDI.)
See my comments on the original answer for more optimization suggestions. I didn't implement anything more than the most basic thing of combining the three byte-stores into one dword store. Unrolling so you can use wider stores would help a lot, but this is left as an exercise for the reader.

How do I test my bootloader on a floppy disk

Here is my code:
http://pastebin.com/pSncVNPK
[BITS 16] ;Tells the assembler that its a 16 bit code
[ORG 0x7C00] ;Origin, tell the assembler that where the code will
;be in memory after it is been loaded
MOV SI, HelloString ;Store string pointer to SI
CALL PrintString ;Call print string procedure
JMP $ ;Infinite loop, hang it here.
PrintCharacter: ;Procedure to print character on screen
;Assume that ASCII value is in register
AL MOV AH, 0x0E ;Tell BIOS that we need to print one charater on screen.
MOV BH, 0x00 ;Page no.
MOV BL, 0x07 ;Text attribute 0x07 is lightgrey font on black background
INT 0x10 ;Call video interrupt RET ;Return to calling procedure
PrintString: ;Procedure to print string on screen
;Assume that string starting pointer is in register SI
next_character: ;Label to fetch next character from string
MOV AL, [SI] ;Get a byte from string and store in AL register
INC SI ;Increment SI pointer
OR AL, AL ;Check if value in AL is zero (end of string)
JZ exit_function ;If end then return
CALL PrintCharacter ;Else print the character which is in AL register
JMP next_character ;Fetch next character from string
exit_function: ;End label
RET ;Return from procedure
;Data
HelloString db 'Hello World', 0 ;HelloWorld string ending with 0
TIMES 510 - ($ - $$) db 0 ;Fill the rest of sector with 0
DW 0xAA55 ;Add boot signature at the end of bootloader
As you can see the syntax appears to be correct, compiled it into a .bin file, BUT I'm trying to figure out how to test it. Please treat me like I'm a bit slow because I've spent HOURS googling this topic and nothing seems to work, I've even tried using a hex editor as per some tutorial but it didn't work. This seems to be the closest I've gotten is using these instructions:
http://puu.sh/6KzUo.png
from this link: How to make an bootable iso(not cd or flash drive) for testing your own boot loader?
Except I don't quite understand step 6 because VM box won't let me select the img file as a bootable disk.
Thanks!

If you just need to add a Floppy Disk into the disk controller, this is how to do it:
Click on the Floppy Controller. An icon of a floppy with a green plus sign should come up on the left of your selection. Click on this small icon.
A dialog should now come up:
Select "Choose Disk"
The file selection box will come up---at this point, choose your .img file from the file selection box.
From this point you should be able to boot the virtual machine from the floppy disk and test your bootloader.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string