NASM - Use labels for code loaded from disk - nasm

As a learning experience, I'm writing a boot-loader for BIOS in NASM on 16-bit real mode in my x86 emulator, Qemu.
BIOS loads your boot-sector at address 0x7C00. NASM assumes you start at 0x0, so your labels are useless unless you do something like specify the origin with [org 0x7C00] (or presumably other techniques). But, when you load the 2nd stage boot-loader, its RAM origin is different, which complicates the hell out of using labels in that newly loaded code.
What's the recommended way to deal with this? It this linker territory? Should I be using segment registers instead of org?
Thanks in advance!
p.s. Here's the code that works right now:
[bits 16]
[org 0x7c00]
LOAD_ADDR: equ 0x9000 ; This is where I'm loading the 2nd stage in RAM.
start:
mov bp, 0x8000 ; set up the stack
mov sp, bp ; relatively out of the way
call disk_load ; load the new instructions
; at 0x9000
jmp LOAD_ADDR
%include "disk_load.asm"
times 510 - ($ - $$) db 0
dw 0xaa55 ;; end of bootsector
seg_two:
;; this is ridiculous. Better way?
mov cx, LOAD_ADDR + print_j - seg_two
jmp cx
jmp $
print_j:
mov ah, 0x0E
mov al, 'k'
int 0x10
jmp $
times 2048 db 0xf

You may be making this harder than it is (not that this is trivial by any means!)
Your labels work fine and will continue to work fine. Remember that, if you look under the hood at the machine code generated, your short jumps (everything after seg_two in what you've posted) are relative jumps. This means that the assembler doesn't actually need to compute the real address, it simply needs to calculate the offset from the current opcode. However, when you load your code into RAM at 0x9000, that is an entirely different story.
Personally, when writing precisely the kind of code that you are, I would separate the code. The boot sector stops at the dw 0xaa55 and the 2nd stage gets its own file with an ORG 0x9000 at the top.
When you compile these to object code you simply need to concatenate them together. Essentially, that's what you're doing now except that you are getting the assembler to do it for you.
Hope this makes sense. :)

Related

Is it possible for a program to read itself?

Theoretical question. But let's say I have written an assembly program. I have "labelx:" I want the program to read at this memory address and only this size and print to stdout.
Would it be something like
jmp labelx
And then would i then use the Write syscall , making sure to read from the next instruction from labelx:
mov rsi,rip
mov rdi,0x01
mov rdx,?
mov rax,0x01
syscall
to then output to stdout.
However how would I obtain the size to read itself? Especially if there is a
label after the code i want to read or code after. Would I have to manually
count the lines?
mov rdx,rip+(bytes*lines)
And then syscall with populated registers for the syscall to write to from rsi to rdi. Being stdout.
Is this Even possible? Would i have to use the read syscall first, as the write system call requires rsi to be allocated memory buffer. However I assumed .text is already allocated memory and is read only. Would I have to allocate onto the stack or heap or a static buffer first before write, if it's even possible in the first place?
I'm using NASM syntax btw. And pretty new to assembly. And just a question.
Yes, the .text section is just bytes in memory, no different from section .rodata where you might normally put msg: db "hello", 10. x86 is a Von Neumann architecture (not Harvard), so there's no distinction between code pointers and data pointers, other than what you choose to do with them. Use objdump -drwC -Mintel on a linked executable to see the machine-code bytes, or GDB's x command in a running process, to see bytes anywhere.
You can get the assembler to calculate the size by putting labels at the start/end of the part you want, and using mov edx, prog_end - prog_start in the code at the point where you want that size in RDX.
See How does $ work in NASM, exactly? for more about subtracting two labels (in the same section) to get a size. (Where $ is an implicit label at the start of the current line, although $ isn't likely what you want here.)
To get the current address into a register, you need a RIP-relative LEA, not mov, because RIP isn't a general-purpose register and there's no special form of mov that reads it.
here:
lea rsi, [rel here] ; with DEFAULT REL you could just use [here]
mov edi, 1 ; stdout fileno
mov edx, .end - here ; assemble-time constant size calculation
mov eax, 1 ; __NR_write
syscall
.end:
This is fully position-independent, unlike if you used mov esi, here. (How to load address of function or label into register)
The LEA could use lea rsi, [rel $] to assemble to the same machine-code bytes, but you want a label there so you can subtract them.
I optimized your MOV instructions to use 32-bit operand-size, implicitly zero-extending into the full 64-bit RDX and RAX. (And RDI, but write(int fd, void *buf, size_t len) only looks at EDI anyway for the file descriptor).
Note that you can write any bytes of any section; there's nothing special about having a block of code write itself. In the above example, put the start/end labels anywhere. (e.g. foo: and .end:, and mov edx, foo.end - foo taking advantage of how NASM local labels work, by appending to the previous non-local label, so you can reference them from somewhere else. Or just give them both non-dot names.)

How to comprehend the flow of this assembly code

I can' t understand how this works.
Here's a part of main() program disassembled by objdump and written in intel notation
0000000000000530 <main>:
530: lea rdx,[rip+0x37d] # 8b4 <_IO_stdin_used+0x4>
537: mov DWORD PTR [rsp-0xc],0x0
53f: movabs r10,0xedd5a792ef95fa9e
549: mov r9d,0xffffffcc
54f: nop
550: mov eax,DWORD PTR [rsp-0xc]
554: cmp eax,0xd
557: ja 57c <main+0x4c>
559: movsxd rax,DWORD PTR [rdx+rax*4]
55d: add rax,rdx
560: jmp rax
The rodata section dump:
.rodata
08b0 01000200 ecfdffff d4fdffff bcfdffff ................
08c0 9cfdffff 7cfdffff 6cfdffff 4cfdffff ....|...l...L...
08d0 3cfdffff 2cfdffff 0cfdffff ecfcffff <...,...........
08e0 d4fcffff b4fcffff 0cfeffff ............
In 530, rip is [537] so [rdx] = [537 + 37d] = 8b4.
First question is the value of rdx is how large? Is the valueis ec, or ecfdffff or something else? If it has DWORD, I can understand that has 'ecfdffff' (even this is wrong too?:() but this program don't declare it. How can I judge the value?
Then the program continues.
In 559, rax is first appeared.
The second question is this rax can interpret as a part of eax and in this time is the rax = 0? If rax is 0, in 559 means rax = DWORD[rdx] and the value of rax become ecfdffff and next [55d] do rax += rdx, and I think this value can't jamp. There must be something wrong, so tell me where, or how i make any wrongs.
I think I'll diverge from what Peter discussed (he provides good information) and get to the heart of some issues I think are causing you problems. When I first glanced at this question I assumed that the code was likely compiler generated and the jmp rax was likely the result of some control flow statement. The most likely way to generate such a code sequence is via a C switch. It isn't uncommon for a switch statement to be made of a jump table to say what code should execute depending on the control variable. As an example: the control variable for switch(a) is a.
This all made sense to me, and I wrote up a number of comments (now deleted) that ultimately resulted in bizarre memory addresses that jmp rax would go to. I had errands to run but when I returned I had the aha moment that you may have had the same confusion I did. This output from objdump using the -s option appeared as:
.rodata
08b0 01000200 ecfdffff d4fdffff bcfdffff ................
08c0 9cfdffff 7cfdffff 6cfdffff 4cfdffff ....|...l...L...
08d0 3cfdffff 2cfdffff 0cfdffff ecfcffff <...,...........
08e0 d4fcffff b4fcffff 0cfeffff ............
One of your questions seems to be about what values get loaded here. I never used the -s option to look at data in the sections and was unaware that although the dump splits the data out in groups of 4 bytes (32-bit values) they are shown in byte order as it appears in memory. I had at first assumed the output was displaying these values from Most Significant Byte to Least significant byte and objdump -s had done the conversion. That is not the case.
You have to manually reverse the bytes of each group of 4 bytes to get the real value that would be read from memory into a register.
ecfdffff in the output actually means ec fd ff ff. As a DWORD value (32-bit) you need to reverse the bytes to get the HEX value as you would expect when loaded from memory. ec fd ff ff reversed would be ff ff fd ec or the 32-bit value 0xfffffdec. Once you realize that then this makes a lot more sense. If you make this same adjustment for all the data in that table you'd get:
.rodata
08b0: 0x00020001 0xfffffdec 0xfffffdd4 0xfffffdbc
08c0: 0xfffffd9c 0xfffffd7c 0xfffffd6c 0xfffffd4c
08d0: 0xfffffd3c 0xfffffd2c 0xfffffd0c 0xfffffcec
08e0: 0xfffffcd4 0xfffffcb4 0xfffffe0c
Now if we look at the code you have it starts with:
530: lea rdx,[rip+0x37d] # 8b4 <_IO_stdin_used+0x4>
This doesn't load data from memory, it is computing the effective address of some data and places the address in RDX. The disassembly from OBJDUMP is displaying the code and data with the view that it is loaded in memory starting at 0x000000000000. When it is loaded into memory it may be placed at some other address. GCC in this case is producing position independent code (PIC). It is generated in such a way that the first byte of the program can start at an arbitrary address in memory.
The # 8b4 comment is the part we are concerned about (you can ignore the information after that). The disassembly is saying if the program was loaded at 0x0000000000000000 then the value loaded into RDX would be 0x8b4. How was that arrived at? This instruction starts at 0x530 but with RIP relative addressing the RIP (instruction pointer) is relative to the address just after the current instruction. The address the disassembler used was 0x537 (the byte after the current instruction is the address of the first byte of the next instruction). The instruction adds 0x37d to RIP and gets 0x537+0x37d=0x8b4. The address 0x8b4 happens to be in the .rodata section which you are given a dump of (as discussed above).
We now know that RDX contains the base of some data. The jmp rax suggests this is likely going to be a table of 32-bit values that are used to determine what memory location to jump to depending on the value in the control variable of a switch statement.
This statement appears to be storing the value 0 as a 32-bit value on the stack.
537: mov DWORD PTR [rsp-0xc],0x0
These appear to be variables that the compiler chose to store in registers (rather than memory).
53f: movabs r10,0xedd5a792ef95fa9e
549: mov r9d,0xffffffcc
R10 is being loaded with the 64-bit value 0xedd5a792ef95fa9e. R9D is the lower 32-bits of the 64-bit R9 register.The value 0xffffffcc is being loaded into the lower 32-bits of R9 but there is something else occurring. In 64-bit mode if the destination of an instruction is a 32-bit register the CPU automatically zero extends the value into the upper 32-bits of the register. The CPU is guaranteeing us that the upper 32-bits are zeroed.
This is a NOP and doesn't do anything except align the next instruction to memory address 0x550. 0x550 is a value that is 16-byte aligned. This has some value and may hint that the instruction at 0x550 may be the first instruction at the top of a loop. An optimizer may place NOPs into the code to align the first instruction at the top of a loop to a 16-byte aligned address in memory for performance reasons:
54f: nop
Earlier the 32-bit stack based variable at rsp-0xc was set to zero. This reads the value 0 from memory as a 32-bit value and stores it in EAX. Since EAX is a 32-bit register being used as the destination for the instruction the CPU automatically filled the upper 32-bits of RAX to 0. So all of RAX is zero.
550: mov eax,DWORD PTR [rsp-0xc]
EAX is now being compared to 0xd. If it is above (ja) it goes to the instruction at 0x57c.
554: cmp eax,0xd
557: ja 57c <main+0x4c>
We then have this instruction:
559: movsxd rax,DWORD PTR [rdx+rax*4]
The movsxd is an instruction that will take a 32-bit source operand (in this case the 32-bit value at memory address RDX+RAX*4) load it into the bottom 32-bits of RAX and then sign extend the value into the upper 32-bits of RAX. Effectively if the 32-bit value is negative (the most significant bit is 1) the upper 32-bits of RAX will be set to 1. If the 32-bit value is not negative the upper 32-bits of RAX will be set to 0.
When this code is first encountered RDX contains the base of some table at 0x8b4 from the beginning of the program loaded in memory. RAX is set to 0. Effectively the first 32-bits in the table are copied to RAX and sign extended. As seen earlier the value at offset 0xb84 is 0xfffffdec. That 32-bit value is negative so RAX contains 0xfffffffffffffdec.
Now to the meat of the situation:
55d: add rax,rdx
560: jmp rax
RDX still holds the address to the beginning of a table in memory. RAX is being added to that value and stored back in RAX (RAX = RAX+RDX). We then JMP to the address stored in RAX. So this code all seems to suggest we have a JUMP table with 32-bit values that we are using to determine where we should go. So then the obvious question. What are the 32-bit values in the table? The 32-bit values are the difference between the beginning of the table and the address of the instruction we want to jump to.
We know the table is 0x8b4 from the location our program is loaded in memory. The C compiler told the linker to compute the difference between 0x8b4 and the address where the instruction we want to execute resides. If the program had been loaded into memory at 0x0000000000000000 (hypothetically), RAX = RAX+RDX would have resulted in RAX being 0xfffffffffffffdec + 0x8b4 = 0x00000000000006a0. We then use jmp rax to jump to 0x6a0. You didn't show the entire dump of memory but there is going to be code at 0x6a0 that will execute when the value passed to the switch statement is 0. Each 32-bit value in the JUMP table will be a similar offset to the code that will execute depending on the control variable in the switch statement. If we add 0x8b4 to all the entries in the table we get:
08b0: 0x000006a0 0x00000688 0x00000670
08c0: 0x00000650 0x00000630 0x00000620 0x00000600
08d0: 0x000005F0 0x000005e0 0x000005c0 0x000005a0
08e0: 0x00000588 0x00000568 0x000006c0
You should find that in the code you haven't provided us that these addresses coincide with code that appears after the jmp rax.
Given that the memory address 0x550 was aligned, I have a hunch that this switch statement is inside a loop that keeps executing as some kind of state machine until the proper conditions are met for it to exit. Likely the value of the control variable used for the switch statement is changed by the code in the switch statement itself. Each time the switch statement is run the control variable has a different value and will do something different.
The control variable for the switch statement was originally checked for the value being above 0x0d (13). The table starting at 0x8b4 in the .rodata section has 14 entries. One can assume the switch statement probably has 14 different states (cases).
but this program don't declare it
You're looking at disassembly of machine code + data. It's all just bytes in memory. Any labels the disassembler does manage to show are ones that got left in the executable's symbol table. They're irrelevant to how the CPU runs the machine code.
(The ELF program headers tell the OS's program loader how to map it into memory, and where to jump to as an entry point. This has nothing to do with symbols, unless a shared library references some globals or functions defined in the executable.)
You can single-step the code in GDB and watch register values change.
In 559, rax is first appeared.
EAX is the low 32 bits of RAX. Writing to EAX zero-extends into RAX implicitly. From mov DWORD PTR [rsp-0xc],0x0 and the later reload, we know that RAX=0.
This must have been un-optimized compiler output (or volatile int idx = 0; to defeat constant propagation), otherwise it would know at compile time that RAX=0 and could optimize away everything else.
lea rdx,[rip+0x37d] # 8b4
A RIP-relative LEA puts the address of static into a register. It's not a load from memory. (That happens later when movsxd with an indexed addressing mode uses RDX as the base address.)
The disassembler worked out the address for you; it's RDX = 0x8b4. (Relative to the start of the file; when actually running the program would be mapped at a virtual address like 0x55555...000)
554: cmp eax,0xd
557: ja 57c <main+0x4c>
559: movsxd rax,DWORD PTR [rdx+rax*4]
55d: add rax,rdx
560: jmp rax
This is a jump table. First it checks for an out-of-bounds index with cmp eax,0xd, then it indexes a table of 32-bit signed offsets using EAX (movsxd with an addressing mode that scales RAX by 4), and adds that to the base address of the table to get a jump target.
GCC could just make a jump table of 64-bit absolute pointers, but chooses not to so that .rodata is position-independent as well and doesn't need load-time fixups in a PIE executable. (Even though Linux does support doing that.) See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 where this is discussed (although the main focus of that bug is that gcc -fPIE can't turn a switch into a table lookup of string addresses, and actually still uses a jump table)
The jump-offset table address is in RDX, this is what was set up with the earlier LEA.

Why does VC++ 2010 often use ebx as a "zero register"?

Yesterday I was looking at some 32 bit code generated by VC++ 2010 (most probably; don't know about the specific options, sorry) and I was intrigued by a curious recurring detail: in many functions, it zeroed out ebx in the prologue, and it always used it like a "zero register" (think $zero on MIPS). In particular, it often:
used it to zero out memory; this is not unusual, as the encoding for a mov mem,imm is 1 to 4 bytes bigger than mov mem,reg (the full immediate value size has to be encoded even for 0), but usually (gcc) the necessary register is zeroed out "on demand", and kept for more useful purposes otherwise;
used it for compares against zero - as in cmp reg,ebx. This is what stroke me as really unusual, as it should be exactly the same as test reg,reg, but adds a dependency to an extra register. Now, keep in mind that this happened in non-leaf functions, with ebx being often pushed (by the callee) on and off the stack, so I would not trust this dependency to be always completely free. Also, it also used test reg,reg in the exact same fashion (test/cmp => jg).
Most importantly, registers on "classic" x86 are a scarce resource, if you start having to spill registers you waste a lot of time for no good reason; why waste one through all the function just to keep a zero in it? (still, thinking about it, I don't remember seeing much register spillage in functions that used this "zero-register" pattern).
So: what am I missing? Is it a compiler blooper or some incredibly smart optimization that was particularly interesting in 2010?
Here's an excerpt:
; standard prologue: ebp/esp, SEH, overflow protection, ... then:
xor ebx, ebx
mov [ebp+4], ebx ; zero out some locals
mov [ebp], ebx
call function_1
xor ecx, ecx ; ebx _not_ used to zero registers
cmp eax, ebx ; ... but used for compares?! why not test eax,eax?
setnz cl ; what? it goes through cl to check if eax is not zero?
cmp ecx, ebx ; still, why not test ecx,ecx?
jnz function_body
push 123456
call throw_something
function_body:
mov edx, [eax]
mov ecx, eax ; it's not like it was interested in ecx anyway...
mov eax, [edx+0Ch]
call eax ; virtual method call; ebx is preserved but possibly pushed/popped
lea esi, [eax+10h]
mov [ebp+0Ch], esi
mov eax, [ebp+10h]
mov ecx, [eax-0Ch]
xor edi, edi ; ugain, registers are zeroed as usual
mov byte ptr [ebp+4], 1
mov [ebp+8], ecx
cmp ecx, ebx ; why not test ecx,ecx?
jg somewhere
label1:
lea eax, [esi-10h]
mov byte ptr [ebp+4], bl ; ok, uses bl to write a zero to memory
lea ecx, [eax+0Ch]
or edx, 0FFFFFFFFh
lock xadd [ecx], edx
dec edx
test edx, edx ; now it's using the regular test reg,reg!
jg somewhere_else
Notice: an earlier version of this question said that it used mov reg,ebx instead of xor ebx,ebx; this was just me not remembering stuff correctly. Sorry if anybody put too much thought trying to understand that.
Everything you commented on as odd looks sub-optimal to me. test eax,eax sets all flags (except AF) the same as cmp against zero, and is preferred for performance and code-size.
On P6 (PPro through Nehalem), reading long-dead registers is bad because it can lead to register-read stalls. P6 cores can only read 2 or 3 not-recently-modified architectural registers from the permanent register file per clock (to fetch operands for the issue stage: the ROB holds operands for uops, unlike on SnB-family where it only holds references to the physical register file).
Since this is from VS2010, Sandybridge wasn't released yet, so it should have put a lot of weight on tuning for Pentium II/III, Pentium-M, Core2, and Nehalem where reading "cold" registers is a possible bottleneck.
IDK if anything like this ever made sense for integer regs, but I don't know much about optimizing for CPUs older than P6.
The cmp / setz / cmp / jnz sequence looks particularly braindead. Maybe it came from a compiler-internal canned sequence for producing a boolean value from something, and it failed to optimize a test of the boolean back into just using the flags directly? That still doesn't explain the use of ebx as a zero-register, which is also completely useless there.
Is it possible that some of that was from inline-asm that returned a boolean integer (using a silly that wanted a zero in a register)?
Or maybe the source code was comparing two unknown values, and it was only after inlining and constant-propagation that it turned into a compare against zero? Which MSVC failed to optimize fully, so it still kept 0 as a constant in a register instead of using test?
(the rest of this was written before the question included code).
Sounds weird, or like a case of CSE / constant-hoisting run amok. i.e. treating 0 like any other constant that you might want to load once and then reg-reg copy throughout the function.
Your analysis of the data-dependency behaviour is correct: moving from a register that was zeroed a while ago essentially starts a new dependency chain.
When gcc wants two zeroed registers, it often xor-zeroes one and then uses a mov or movdqa to copy to the other.
This is sub-optimal on Sandybridge where xor-zeroing doesn't need an execution port, but a possible win on Bulldozer-family where mov can run on the AGU or ALU, but xor-zeroing still needs an ALU port.
For vector moves, it's a clear win on Bulldozer: handled in register rename with no execution unit. But xor-zeroing of an XMM or YMM register still needs an execution port on Bulldozer-family (or two for ymm, so always use xmm with implicit zero-extension).
Still, I don't think that justifies tying up a register for the duration of a whole function, especially not if it costs extra saves/restores. And not for P6-family CPUs where register-read stalls are a thing.

Linux x86 bootloader

I am trying to build a simple x86 Linux bootloader in nasm.
The Linux bzImage is stored on disk partition sda1 starting from the first sector.
I read the real mode code from the bzImage (15 sectors) into memory starting from 0x7E00.
However when i jump into the code it just hangs, nothing happens.
I have created code for the master boot record on sda. I's probably best if i just attach
the whole thing. I would like to know why it just hangs after the far jump instruction.
[BITS 16]
%define BOOTSEG 0x7C0
%define BOOTADDR (BOOTSEG * 0x10)
%define HDRSEG (BOOTSEG + 0x20)
%define HDRADDR (HDRSEG * 0x10)
%define KERNSEG (HDRSEG + 0x20)
[ORG BOOTADDR]
entry_section:
cli
jmp start
start:
; Clear segments
xor ax, ax
mov ds, ax
mov es, ax
mov gs, ax
mov fs, ax
mov ss, ax
mov sp, BOOTADDR ; Lots of room for it to grow down from here
; Read all 15 sectors of realmode code in the kernel
mov ah, 0x42
mov si, dap
mov dl, 0x80
int 0x13
jc bad
; Test magic number of kernel header
mov eax, dword [HDRADDR + 0x202]
cmp eax, 'HdrS'
jne bad
; Test jump instruction is there
mov al, byte [KERNSEG * 16]
cmp al, 0xEB
jne bad
xor ax, ax ; Kernel entry code will set ds = ax
xor bx, bx ; Will also set ss = dx
jmp dword KERNSEG:0
; Simple function to report an error and halt
bad:
mov al, "B"
call putc
jmp halt
; Param: char in al
putc:
mov ah, 0X0E
mov bh, 0x0F
xor bl, bl
int 0x10
ret
halt:
hlt
jmp halt
; Begin data section
dap: ; Disk address packet
db 0x10 ; Size of dap in bytes
db 0 ; Unused
dw 15 ; Number of sectors to read
dw 0 ; Offset where to place data
dw HDRSEG ; Segment where to place data
dd 0x3F ; Low order of start addres in sectors
dd 0 ; High order of start address in sectors
; End data section
times 446-($-$$) db 0 ; Padding to make the MBR 512 bytes
; Hardcoded partition entries
part_boot:
dw 0x0180, 0x0001, 0xFE83, 0x3c3f, 0x003F, 0x0000, 0xF3BE, 0x000E
part_sda2:
dw 0x0000, 0x3D01, 0xFE83, 0xFFFF, 0xF3FD, 0x000E, 0x5AF0, 0x01B3
part_sda3:
dw 0xFE00, 0xFFFF, 0xFE83, 0xFFFF, 0x4EED, 0x01C2, 0xb113, 0x001D
part_sda4:
dw 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000
dw 0xAA55 ; Magic number at relative address 510
mbrend: ; Relative address 512
Assuming your code is a boot loader (and therefore is not an MBR):
Never disable IRQs unless you have to. The BIOS needs them to function correctly, and will enable them inside some BIOS functions anyway (e.g. waiting for a "sectors transferred" IRQ inside disk functions). Because your code is only loading and passing control to more real mode code (e.g. and no switch to protected mode or anything is involved) you have no reason to disable IRQs anywhere in your entire boot loader.
For real mode addressing, it's typically cleaner/easier to use 0x0000:0x7C00 rather than 0x07C0:0x0000. You seem to be attempting to mix both (e.g. set segment registers for the former but define BOOTSEG and HDRSEG for the latter).
The partition table contains "extended partitions" and not "primary partitions" and your partition table is therefore wrong (and should probably be blank/empty).
The boot loader should not assume any specific/hard-coded "starting LBA" (the "starting LBA" for the partition depends on how the end user felt like partitioning their disk when the OS is installed). You need to determine the partition's "starting LBA" from the MBR's primary partition table, which is typically done by hoping that the MBR left DS:SI pointing to your partition's partition table entry.
You should not assume that your are booting from "BIOS device 0x80". The MBR should leave DL set to the correct device number, and there should be no reason why your code shouldn't work if (e.g.) the OS is installed on the second hard drive or something else.
Your hard-coded "starting LBA to read" (in the DAP) is wrong. For historical reasons there's probably 63 sectors per track and your partition starts on the 64th sector. This means that LBA sector 0x3F is the first sector in the partition (which is your boot loader) and is not the first sector of the kernel. I'm assuming the first sector of the kernel might be LBA sector 0x40 (the second sector of the partition).
The "number of sectors" shouldn't be hard-coded either. You'd want to load the beginning of the kernel and examine it, and determine how many sectors to load where from that.
Typically 512 bytes (actually more like 446 bytes) is far too little for a decent boot loader. The first 512 bytes of a boot loader should load the rest of the boot loader (with every spare byte left over used to improve error handling - e.g. puts("Read error while trying to load boot loader") rather than just putc('B')). Everything else (loading the pieces of the kernel, setting up a video mode, setting correct values in the "real mode kernel header" fields, etc) should be in the additional sectors and not be in the first sector.
Note that the way the computer boots has been carefully designed such that any MBR can chainload any OS on any partition of any disk; and the MBR may be part of something larger (e.g. a boot manager) that allows multiple OSs to be installed (e.g. where the user can use a pretty menu or something to choose which partition the MBR's code should chain-load). This design allows the user to replace the MBR (or boot manager) with anything else at any time without effecting any installed OS (or causing all of their installed OSs to need fixing). For a simple example, a user should be able to have 12 different partitions that all contain your boot loader and a separate/independent version of Linux, and then install any boot manager (e.g. GRUB, Plop, GAG, MasterBooter, etc) that they want at any time.
For why your code hangs, it's not very important given that all of the code needs to be rewritten anyway . If you're curious, I'd strongly recommend running it inside an emulator with a debugger (e.g. Bochs) so that you can examine exactly what has happened (e.g. dump memory at 0x00007E00 to see what it contains, single-step the JMP to see what is being executed, etc).
Comment does not match code!
xor bx, bx ; Will also set ss = dx
I seriously doubt if that is your problem...
Disclaimer: I have not done this! I'm "chicken" and have always done my bootin' from floppy.
What I "expect" to see in an MBR is for it to move itself out of the way, and then load the first sector on the active partition to 7C00h again, and then jump there. This "real bootloader" loads the rest. I'm not familiar with the layout of a bzImage - maybe it'll work loaded at 7E00h...
I think I'm in over my head. I'll get me coat...

Hardware VGA Text Mode IO in old dos assembly Issue

After reading about at least the first 3 or 4 chapters of about 4 different books on assembly programming I got to a stage where I can put "Hello World" on a dosbox console using MASM 6.11. Imagine my delight!!
The first version of my program used DOS Function 13h.
The second version of my program used BIOS Function 10h
I now want to do the third version using direct hardware output. I have read the parts of the books that explain the screen is divided into 80x25 on a VGA monitors (not bothered about detecting CGA and all that so my program uses memory address 0B800h for colour VGA, because DOSBox is great and all, and my desire to move to Win Assembler sometime before im 90 years old). I have read that each character on the hardware screen is 2 bytes (1 for the attribute and one for the character itself, therefore you have 80x25x2=4000 bytes). The odd bytes describe the attribute, and the even bytes the ASCII character.
But my problem is this. No matter how I try, I cant get my program to output a simple black and white (which is just the attribute, I assume I can change this reasonably easily) string (which is just an array of bytes) 5 lines from the top of the screen, and 20 characters in from the left edge (which is just the number of blank characters away from a zero based index with 4000 bytes long). (if my calc is correct that is 5x80=400+20=420x2=840 is the starting position of my string within the array of 4000 bytes)
How do I separate the attribute from the character (I got it to work partially but it only shows every second character, then a bunch of random junk (thats how I figured I need some sort of byte pair for the attribute and text), or how do I set it up such that both are recognised together. How do I control he position of the text on the screen once the calcs are made. Where am I going wrong.
I have tried looking around the web for this seemingly simple question but am unable to find a solution. Is there anyone who used to program in DOS and x86 Assembly that can tell me how to do this easy little program by not using BIOS or DOS functions, just with the hardware.
I would really appreicate a simple code snippet if possible. Or a refrence to some site or free e-book. I dont want buying a big book on dos console programming which will end up useless when I move to windows shortly. The only reason I am focused on this is because I want to learn true assembly, not some macro language or some pretensious high level language that claims to be assembly.
I am trying to build a library of routines that will make Assembly easier to learn so people dont have to work though all the 3 to 6 chapters across 10 books of theory esentially explaining again and again the same stuff when really all that is needed is enough to know how to get some output, assign values to variables, get some input, and do some loops and decisions. The theory can come along later, and by the time they get to loops and decisions most people will have done enough assembler to have all the theory anyway. I beleive assembly should be taught no different than any other language starting with a simple hello world program then getting input ect. I want to make this possible. But hey, I'm just a beginner, maybe my taughts will change when I learn more.
One other note, I know for a fact the problem is NOT with DOSBox as I have a very old PC running true MS-DOS V6.2 and the program still doesnt work (but gives almost identical output). In fact, DOSBox actually runs some of my old programs even better than True dos. Gem desktop being one example. Just wanted to get that cleared before people try suggesting its a problem with the emulator. It cant be, not with such simple programs. No im afraid the problem is with my little brain not fully understanding what is needed.
Can anyone out there please help!!
Below is the program I used (MASM 6.1 Under DOSBox on Win 7 64-bit). It uses BIOS Intrrupt 10h Function 13h sub function 0. I want to do the very same using direct hardware IO.
.model small
.stack
.data ;part of the program containing data
;Constants - None
;Variables
MyMsg db 'Hello World'
.code
Main:
GetAddress:
mov ax,#data ;Gets address of data segment
mov es,ax ;Loads segment address into es regrister
mov bp,OFFSET MyMsg ;Load Offset into DX
SetAttributes:
mov bl,01001111b ;BG/FG Colour attributes
mov cx,11 ;Length of string in data segment
SetRowAndCol:
mov dh,24 ;Set the row to start printing at
mov dl,68 ;Set the column to start printing at
GetFunctionAndSub:
mov ah,13h ;BIOS Function 10h - String Output
mov al,0 ;BIOS Sub-Function (0-3)
Execute:
int 10h ;BIOS Interrupt 10h
EndProg:
mov ax,4c00h ;Terminate program return 0 to OS
int 21h ;DOS Interrupt 21h
end Main
end
I want to have this in a format that is easy to explain. So here is my current workings. I've almost got it. But it only prints the attributes, getting the characters on screen is a problem. (Ocasionally when I modify it slightly, I get every second character with random attributes (I think I know the technicalities of why, but dont know enough assembler to fix it)).
.model small
.stack
.data
;Constants
ScreenSeg equ 0B800h
;Variables
MyMsg db 'Hello World'
StrLen equ $-MyMsg
.code
Main:
SetSeg:
mov ax, ScreenSeg ;set segment register:
mov ds, ax
InitializeStringLoop: ;Display all characters: - Not working :( Y!
mov cx, StrLen ;number of characters.
mov di, 00h ;start from byte 'h'
OutputString:
mov [di], offset byte ptr MyMsg[di]
add di, 2 ;skip over next attribute code in vga memory.
loop OutputString
InitializeAttributeLoop:;Color all characters: - Atributes are working fine.
mov cx, StrLen ;number of characters.
mov di, 01h ;start from byte after 'h'
;Assuming I have all chars with same attributes - fine for now - later I would make this
;into a procedure that I will just pass the details into. - But for now I just want a
;basic output tutorial.
OutputAttributes:
mov [di], 11101100b ;light red(1100) on yellow(1110)
add di, 2 ;skip over next ascii code in vga memory.
loop OutputAttributes
EndPrg:
mov ax, 4C00h
int 21h
end Main
Of course I want to reduce the instructions used to the bare bones essentials. (for proper tuition purposes, less to cover when teaching others). Hense the reason I did not use MOVSB/W/D ect with REP. I opted instead for an easy to explain manual loop using standard MOV, INC, ADD ect. These are instructions that are basic enough and easy to explain to newcommers. So if possible I would like to keep it as close to this as possible.
I know esentially all that seems to be wrong is the loop for the actual string handler. Its not letting me increment the address the way I want it to. Its embarasssing to me cause I am actually quite a good progammer using C++, C#, VB, and Delphi (way back when)). I know you wouldnt think that given I cant even get a loop right in assembler, but it is such a different language. There are 2 or 3 loops in high level languages, and it seems there are an infinate combination of ways to do loops in assembler depending on the instructions. So I say "Simple Loop", but in reality there is little simple about it.
I hope someone can help me with this, you would be saving my assembly carreer, and ensuring I eventually become a good assembly teacher. Thanks in advance, and especially for reading this far.
The typical convention would be to use ds:si as source, and es:di as destination.
So it would end up being similar to (untested):
mov ax, #data
mov ds, ax
mov ax, ScreenSeg
mov es, ax
...
mov si, offset MyMsg
OutputString:
mov al, byte ptr ds:[si]
mov byte ptr es:[di], al
add si, 1 ; next character from string
add di, 2 ; skip over next attribute code in vga memory.
loop OutputString
I would suggest getting the Masm32 Package if you don't already have it. It is mainly geared towards easily using Assembly Language in "Windows today", which is very nice, but also will teach you a lot about Asm and also shows where to get the Intel Chip manuals that were mentioned in an earlier reply that are indispensable.
I started programming in the 80's and so I can see why you are so interested in the nitty gritty of it, I know I miss it. If more people were to start out there, it would pay off for them very much. You are doing a great service!
I am playing with exactly what you are talking about, Direct Hardware, and I have also learned that Windows has changed some of the DOS services and BIOS services have changed too, so that some don't work any more. I am in fact writing a small .com program and running it from Win7 in a Command Prompt Window, Prints a msg and waits for a key, Pretty cool considering it's Win7 in 2012!
In fact it was BIOS 10h - 0Eh that did not work and so I tried Dos 21h 02h to write to the screen and it worked. The code is below because it is a .com (Command Program) i thought it might be of use to you.
; This makes a .com program (64k Limit, Code, Data and all
; have to fit in this space. Used for small utilities and
; good for very fast tasks. In fact DOS Commands are mostly
; small .com programs like this (except more useful)!
;
; Assemble with Masm using
; c:\masm32\bin\ml /AT /c bfc.asm
; Link with Masm's Link16 using
; c:\masm32\bin\link16 bfc.obj,bfc.com;
;
; Link16 is the key to making this 16bit .com (Command) file
SEGMT SEGMENT
org 100h
Start:
push CS
pop DS
MOV SI, OFFSET Message
Next:
MOV ah, 02h ; Write Char to Standard out
MOV dl, [si] ; Char
INT 21h ; Write it
INC si ; Next Char
CMP byte ptr[si], 0 ; Done?
JNE Next ; Nope
WaitKey:
XOR ah, ah ; 0
INT 16h ; Wait for any Key
ExitHere:
MOV ah, 4Ch ; Exit with Return Code
xor al, al ; Return Code
INT 21h
Message db "It Works in Windows 7!", 0
SEGMT ENDS
END Start
I used to do all of what you are talking about. Trying to remember the details. Michael Abrash is a name you should be googling for. Mode-X for example a 200 something by 200 (240x200?) 256 color mode was very popular as it broke the 16 color boundary and at the time the games looked really good.
I think that the on the metal register programming was doable but painful and you likely need to get the programmers reference/datasheet for the chip you are interested in. As time passed from CGA to EGA to VGA to VESA the way things worked changed as well. I think typically you use int something calls to get access to the page frame then you could fill that in directly. VESA I think worked that way, VESA was a big livesaver when it came to video card support, you used to have to write your own drivers for each chip before then (if you didnt want the ugly standard modes).
I would look at mode-x or vesa and go from there. You need to have a bit of a hacker inside to get through some of this anyway, it is very rare to find a datasheet/programmers reference manual that is complete and accurate, you always have to just shove some bytes around to see what happens. Start filling those memory blocks that are supposed to be the page frames until you see something change on the screen...
I dont see any specific graphics programming books in my library other than the abrash books like the graphics programming black book, which was at the tail end of this period of time. I have bios and dos programmers references and used ralf browns list too. I am sure that I had copies of the manuals for popular video chips (before the internet remember you called a human on that phone thing with a cord hanging out of it, the human took a printed manual, sometimes nicely bound sometimes just a staple in the corner if that, put it in an envelope and mailed it to you and that was your only copy unless you ran it through the copier). I have stacks of printed stuff that, sorry to say, am not going to go through to answer this question. I will keep this question in my mind though and look around some more for info, actually I may have some of my old programs handy, drawing fractals and other such things (direct as practical to the video card/memory).
EDIT.
I know you are looking for text mode stuff, and this is a graphics mode but it may or may not shed some light on what you are trying to do. combination of int calls and filling pages and palette memory directly.
http://dwelch.s3.amazonaws.com/fly.tar.gz

Resources