ASCIIZ string ending with a zero byte - string

I was writing an Assembly level program to create a file.
.model small
.data
Fn db "test"
.code
mov ax,#data
mov ds,ax
mov CX,00
lea DX,Fn
mov ah,3ch
int 21h
Mov ah,4ch
Into 21h
End
Although program had no errors, but file was not created, so I searched the internet for getting the reason.
Then I found ASCIIZ.
So I replaced data segment with
.data
Fn db "test", 0
It worked.
Why do we need to use ASCIIZ and why can't a normal string be used to create a file?

Let's say you have multiple string into your .data section:
Fn db "test"
s1 db "aaa"
s2 db "bbb"
When you will compile it, .data section will have all 3 strings in it, one after other:
0x74 0x65 0x73 0x74 0x61 0x61 0x61 0x62 0x62 0x62
which is binary representation for testaaabbb.
There must be a way for functions to figure out where first string ends and the second begins. This "marker" is 0x00 byte ( "\x00" ), this is also know as "null byte terminated string" or ASCIIZ, that way you can know where your string is ending:
Fn db "test",0
s1 db "aaa",0x00 ; is the same
s2 db "bbb\x00" ; still same thing
now your .data section will looks like this
0x74 0x65 0x73 0x74 0x00 0x61 0x61 0x61 0x00 0x62 0x62 0x62 0x00
which is test\x00aaa\x00bbb\x00and now you have a delimited between strings so when you provide the starting address of your string to a function, it will know where exactly your string ends.

Related

NASM ASSEMBLY - Print "Hello World"

I've created a string and turned it into an array. Looping through each index and moving to the al register so it can print out to the vga. The problem is, it prints the size of the string with no problem, but the characters in gibberish. Can you please help me figure out what the problem is in the code. It will be highly appreciated.
org 0
bits 16
section .text
global _start
_start:
mov si, msg
loop:
inc si
mov ah, 0x0e
mov al, [si]
or al, al
jz end
mov bh, 0x00
int 0x10
jmp loop
end:
jmp .done
.done:
jmp $
msg db 'Hello, world!',0xa
len equ $ - msg
TIMES 510 - ($ - $$) db 0
DW 0xAA55
bootloader code
ORG 0x7c00
BITS 16
boot:
mov ah, 0x02
mov al, 0x01
mov ch, 0x00
mov cl, 0x02
mov dh, 0x00
mov dl, 0x00
mov bx, 0x1000
mov es, bx
int 0x13
jmp 0x1000:0x00
times 510 - ($ - $$) db 0
dw 0xAA55
The bootloader
Before tackling the kernel code, let's look at the bootloader that brings the kernel in memory.
You have written a very minimalistic version of a bootloader, one that omits much of the usual stuff like setting up segment registers, but thanks to its reduced nature that's not really a problem.
What could be a problem is that you wrote mov dl, 0x00, hardcoding a zero to select the first floppy as your bootdisk. No problem if this is indeed the case, but it would be much better to just use whatever value the BIOS preloaded the DL register with. That's the ID for the disk that holds your bootloader and kernel.
What is a problem is that you load the kernel to the segmented address 0x1000:0x1000 and then later jump to the segmented address 0x1000:0x0000 which is 4096 bytes short of the kernel. You got lucky that the kernel code did run in the end, thanks to the memory between these two addresses most probably being filled with zero-bytes that (two by two) translate into the instruction add [bx+si], al. Because you omitted setting up the DS segment register, we don't know what unlucky byte got overwritten so many times. Let's hope it was not an important byte...
mov bx, 0x1000
mov es, bx
xor bx, bx <== You forgot to write this instruction!
int 0x13
jmp 0x1000:0x0000
What is a problem is that you ignore the possibility of encountering troubles when loading a sector from the disk. At the very least you should inspect the carry flag that the BIOS.ReadSector function 02h reports and if the flag is set you could abort cleanly. A more sophisticated approach would also retry a limited number of times, say 3 times.
ORG 0x7C00
BITS 16
; IN (dl)
mov dh, 0x00 ; DL is bootdrive
mov cx, 0x0002
mov bx, 0x1000
mov es, bx
xor bx, bx
mov ax, 0x0201 ; BIOS.ReadSector
int 0x13 ; -> AH CF
jc ERR
jmp 0x1000:0x0000
ERR:
cli
hlt
jmp ERR
times 510 - ($ - $$) db 0
dw 0xAA55
The kernel
After the jmp 0x1000:0x0000 instruction has brought you to the first instruction of your kernel, the CS code segment register holds the value 0x1000. None of the other segment registers did change, and since you did not setup any of them in the bootloader, we still don't know what any of them contain. However in order to retrieve the bytes from the message at msg with the mov al, [si] instruction, we need a correct value for the DS data segment register. In accordance with the ORG 0 directive, the correct value is the one we already have in CS. Just two 1-byte instructions are needed: push cs pop ds.
There's more to be said about the kernel code:
The printing loop uses a pre-increment on the pointer in the SI register. Because of this the first character of the string will not get displayed. You could compensate for this via mov si, msg - 1.
The printing loop processes a zero-terminating string. You don't need to prepare that len equate. What you do need is an explicit zero byte that terminates the string. You should not rely on that large number of zero bytes thattimes produced. In some future version of the code there might be no zero byte at all!
You (think you) have included a newline (0xa) in the string. For the BIOS.Teletype function 0Eh, this is merely a linefeed that moves down on the screen. To obtain a newline, you need to include both carriage return (13) and linefeed (10).
There's no reason for your kernel code to have the bootsector signature bytes at offset 510. Depending on how you get this code to the disk, it might be necessary to pad the code up to (a multiple of) 512, so keep times 512 - ($ - $$) db 0.
The kernel:
ORG 0
BITS 16
section .text
global _start
_start:
push cs
pop ds
mov si, msg
mov bx, 0x0007 ; DisplayPage=0, GraphicsColor=7 (White)
jmp BeginLoop
PrintLoop:
mov ah, 0x0E ; BIOS.Teletype
int 0x10
BeginLoop:
mov al, [si]
inc si
test al, al
jnz PrintLoop
cli
hlt
jmp $-2
msg db 'Hello, world!', 13, 10, 0
TIMES 512 - ($ - $$) db 0

Using db to declare a string in assembly NASM

I am following a tutorial to write a hello world bootloader in assembly and I am using the NASM assembler for an x-86 machine. This is the code I am using :
[BITS 16] ;Tells the assembler that its a 16 bit code
[ORG 0x7C00] ;Origin, tell the assembler that where the code will
;be in memory after it is been loaded
MOV SI, HelloString ;Store string pointer to SI
CALL PrintString ;Call print string procedure
JMP $ ;Infinite loop, hang it here.
PrintCharacter: ;Procedure to print character on screen
;Assume that ASCII value is in register AL
MOV AH, 0x0E ;Tell BIOS that we need to print one charater on screen.
MOV BH, 0x00 ;Page no.
MOV BL, 0x07 ;Text attribute 0x07 is lightgrey font on black background
INT 0x10 ;Call video interrupt
RET ;Return to calling procedure
PrintString: ;Procedure to print string on screen
;Assume that string starting pointer is in register SI
next_character: ;Lable to fetch next character from string
MOV AL, [SI] ;Get a byte from string and store in AL register
INC SI ;Increment SI pointer
OR AL, AL ;Check if value in AL is zero (end of string)
JZ exit_function ;If end then return
CALL PrintCharacter ;Else print the character which is in AL register
JMP next_character ;Fetch next character from string
exit_function: ;End label
RET ;Return from procedure
;Data
HelloString db 'Hello World', 0 ;HelloWorld string ending with 0
TIMES 510 - ($ - $$) db 0 ;Fill the rest of sector with 0
DW 0xAA55 ;Add boot signature at the end of bootloader
I have some difficulty understanding how I can place the complete 'Hello World ' string into one byte using the db command. As I understand it , db stands for define byte and it places the said byte directly in the executable , but surely 'Hello World' is larger than a byte. What am I missing here ?
The pseudo instructions db, dw, dd and friends can define multiple items
db 34h ;Define byte 34h
db 34h, 12h ;Define bytes 34h and 12h (i.e. word 1234h)
They accept character constants too
db 'H', 'e', 'l', 'l', 'o', 0
but this syntax is awkward for strings, so the next logical step was to give explicit support
db "Hello", 0 ;Equivalent of the above
P.S. In general prefer the user-level directives, though for [BITS] and [ORG] is irrelevant.

nasm bootloader, why does where i define ths string matter?

Im experimenting with writing a bootloader in nasm, at the moment it just prints a string.
[BITS 16]
[org 0x7c00]
myString:
db 'Hello World', 0x00
mov bp, 0x8000
mov sp, bp
mov bx, myString
call printString
jmp $
printString:
pusha
mov ah , 0x0e
printStringA:
mov al , [bx]
cmp al, 0x00
je printStringB
int 0x10
add bx, 0x01
jmp printStringA
printStringB:
popa
ret
times 510 -( $ - $$ ) db 0
dw 0xaa55
that works fine, but if i move the string definition to here:
[BITS 16]
[org 0x7c00]
mov bp, 0x8000
mov sp, bp
myString:
db 'Hello World', 0x00
mov bx, myString
call printString
jmp $
printString:
pusha
mov ah , 0x0e
printStringA:
mov al , [bx]
cmp al, 0x00
je printStringB
int 0x10
add bx, 0x01
jmp printStringA
printStringB:
popa
ret
times 510 -( $ - $$ ) db 0
dw 0xaa55
it prints out garbage, im running this in bochs under windows if that helps.
You are assembling to raw machine code. There are no data and text sections. Everything is interpreted as code, including what you insert using db. Hence both code snippets are wrong.
If you finish with an endless loop (as in your example) or a halt instruction, the data can safely be put after the code as it will never be reached. Otherwise you must arrange for the data to be skipped over.
You also need to set the segment registers correctly at the start.
Here is a corrected version, with early declaration of data:
[BITS 16]
[ORG 0]
;;; Set CS and DS
jmp 0x07c0:start
start:
mov ax, cs
mov ds, ax
;;; set SP
mov bp, 0x8000
mov sp, bp
;;; skip over data
jmp L1
myString:
db 'Hello World', 0x00
L1:
mov bx, myString
...
Note that in your first example, the data was interpreted as code.
db 'Hello World', 0x00
is assembled as
48 65 6c 6c 6f 20 57 6f 72 6c 64 00
and corresponds to:
dec ax
gs insb
insb
outsw
and [bx+0x6f],dl
jc short 0x76
fs
db 0x00
In effect this gets executed before your code. It is pure luck that this fragment doesn't prevent your code from working.

asm little-endian register/immediate/memory order

I'm quite new to assembler - I have only done some programming on 8-bit micro-controllers before.
Now I've problems to understand how litte-endian is stored. I already have read the article on wikipedia (http://en.wikipedia.org/wiki/Endianness) and some threads here but I'm still confused.
CPU: x64
Compiler: yasm
OS: Linux
Now the questions:
MOV r32,imm32:
section .bss
var: resb 4 ;reserve 4 bytes
varlen: equ $-var
section .text
global _start
_start:
MOV R10D, 0x6162630A
MOV [var], R10D
CMP R10B, 0x0A
JNE nequal
MOV eax, 0x04 ;printf
MOV ebx, 0x01 ;stdio
MOV ecx, var
MOV edx, varlen
int 0x80 ;tell the kernel to print the msg
end:
MOV eax, 0x01 ;return 0
MOV ebx, 0x00
int 0x80
output:
LF (linefeed - according to 0x0a)
cba
The code above shows that the constant is written to the register without byte-swapping but why is printf reading from the highest memory address to the lowest??
Why is the constant in the register not swapped according to little-endian? Is this compiler-dependent?
Is this correct:
|61h|62h|63h|0Ah| .... the register
31........................0
memory (adr increments in bytes)
adr, data
0x00 61h
0x01 62h
0x02 63h
0x03 0Ah
MOV r32, m32
section .data
msg: db 0x70,0x71,0x72,0x0a
msglen:equ $-msg
section .text
global _start
_start:
MOV EAX, [msg]
CMP AL, 0x70
JNE end
MOV eax, 0x04
MOV ebx, 0x01
MOV ecx, msg
MOV edx, msglen
int 0x80
end:
MOV eax, 0x01
MOV ebx, 0x00
int 0x80
output:
pqrLF (LF represents a linefeed)
Is this correct:
|0Ah|72h|71h|70h| .... the register
31........................0
memory (adr increments in bytes)
adr, data
0x00 70h
0x01 71h
0x02 72h
0x03 0Ah
Could this be a conclusion:
Data in registers is BigEndian and in memory it's LittleEndian?
Thank you for your responses.
Michael
Talking about endianness on registers makes no sense, as registers do not have memory addresses.
From your Wikipedia source: "The terms endian and endianness refer to the convention used to interpret the bytes making up a data word when those bytes are stored in computer memory"

Length of input string in assembly language

I want to do two things:
1) Take a string from user
2) Find the length of that string
I tried the following code:
.model small
.stack 100h
.data
MAXLEN DB 100
ACT_LEN DB 0 ;Actual length of the string
ACT_DATA DB 100 DUP('$') ;String will be stored in ACT_DATA
MSG1 DB 10,13,'ENTER STRING : $'
.CODE
START:
MOV AX,#data
MOV DS,AX
;Normal printing
LEA DX,MSG1
MOV AH,09H
INT 21H
;Cant understand code from here!
LEA DX,ACT_DATA
MOV AH,0AH
MOV DX,OFFSET MAXLEN
INT 21H
LEA SI,ACT_DATA
MOV CL,ACT_LEN
;AND THEH SOME OPERATIONS
END START
But I am confused how the length is stored in CL register, i.e. how the ACT_LEN value is incremented? And what actually does mov AH,0A has relation with length?
Int 21/AH=0Ah
Format of DOS input buffer:
Offset Size Description (Table 01344)
00h BYTE maximum characters buffer can hold (MAXLEN)
01h BYTE (call) number of chars from last input which may be recalled (ACT_LEN)
(ret) number of characters actually read, excluding CR
02h N BYTEs actual characters read, including the final carriage return (ACT_DATA)
The buffered input interrupt will fill in these values.
LEA DX,ACT_DATA
MOV AH,0AH
MOV DX,OFFSET MAXLEN
INT 21H
You do not need LEA DX,ACT_DATA
mov AH,0A is the number of the interrupt to call. Ralph Brown has a big list of interrupts with descriptions and what goes in/comes out.

Resources