Infinite loops seemingly not working in NASM? - graphics

I'm trying to make a DOS program in NASM that uses interrupt 10h to display a pixel cycling through the 16 available colors in the top left corner. I also use interrupt 21h to only make the program run every 1/100 seconds (100 fps).
segment .data
pixelcolor: db 0
pixelx: dw 100
pixely: dw 100
timeaux: db 0 ; used later on to force the program to run at 100fps
segment .text
global _start
_start:
mov ah,00h
mov al,0dh
int 10h
mov ah,0bh
mov bh,00h
mov bl,00h
int 10h
.infinite:
mov ah,2ch
int 21h ; get system time
cmp dl,timeaux ; if 1/100 seconds haven't passed yet...
je .infinite ; ...skip current frame
; else, continue normally
mov byte[timeaux],dl
mov ah,00h
mov al,0dh
int 10h
mov ah,0bh
mov bh,00h
mov bl,00h
int 10h
mov ah,0ch
mov al,pixelcolor
mov cx,pixelx
mov dx,pixely
int 10h
inc byte[pixelcolor]
jmp .infinite
However, when I actually run the program in DOSBox, the pixel just stays red. Does anyone know why my infinite loops aren't working? (Note: I'm very new to NASM, so honestly I'm not even suprised my programs only work 15% of the time.)

The problem isn't actually the loop itself. What the loop is doing each iteration is the problem. Some issues and observations I have are:
Since this is a DOS COM program you will need an org 100h at the top since a COM program is loaded by the DOS loader to offset 100h of the current program segment. Without this the offsets of your data will be incorrect leading to data being read/written to from the wrong memory locations.
You have a problem with mov al,pixelcolor. It needs to be mov al,[pixelcolor]. Without square brackets1 the offset of pixelcolor is moved to AL, not what is stored at offset of pixelcolor. The same goes for pixelx and pixely. Your code prints the same pixel color (red in your case) to the wrong place2 on the screen repeatedly. This code:
mov ah,0ch
mov al,pixelcolor
mov cx,pixelx
mov dx,pixely
int 10h
inc byte[pixelcolor]
should be:
mov ah,0ch
mov al,[pixelcolor]
mov cx,[pixelx]
mov dx,[pixely]
int 10h
inc byte[pixelcolor]
It should be noted that the resolution of the timer by default will only be 18.2 times a second (~55ms). This is less resolution than the 1/100 of a second you are aiming for.
Some versions of DOS may always return 0 for the 1/100 of a second value.
Use of the BIOS to write pixels to the screen may make coding simpler (it abstracts away differences in the video modes) but will be quite slow compared to writing pixels directly to memory.
I would recommend Borland's Turbo Debugger (TD) for debugging DOS software. Turbo Debugger is included in a number of Borland's DOS C/C++ compiler suites.
Footnotes
1The use of brackets [] in NASM differs from MASM/TASM/JWASM.
2Although your question says you want to write to the upper left of the screen, the code suggests you really intended to write the pixel at coordinate 100,100.

Related

Is it possible for a program to read itself?

Theoretical question. But let's say I have written an assembly program. I have "labelx:" I want the program to read at this memory address and only this size and print to stdout.
Would it be something like
jmp labelx
And then would i then use the Write syscall , making sure to read from the next instruction from labelx:
mov rsi,rip
mov rdi,0x01
mov rdx,?
mov rax,0x01
syscall
to then output to stdout.
However how would I obtain the size to read itself? Especially if there is a
label after the code i want to read or code after. Would I have to manually
count the lines?
mov rdx,rip+(bytes*lines)
And then syscall with populated registers for the syscall to write to from rsi to rdi. Being stdout.
Is this Even possible? Would i have to use the read syscall first, as the write system call requires rsi to be allocated memory buffer. However I assumed .text is already allocated memory and is read only. Would I have to allocate onto the stack or heap or a static buffer first before write, if it's even possible in the first place?
I'm using NASM syntax btw. And pretty new to assembly. And just a question.
Yes, the .text section is just bytes in memory, no different from section .rodata where you might normally put msg: db "hello", 10. x86 is a Von Neumann architecture (not Harvard), so there's no distinction between code pointers and data pointers, other than what you choose to do with them. Use objdump -drwC -Mintel on a linked executable to see the machine-code bytes, or GDB's x command in a running process, to see bytes anywhere.
You can get the assembler to calculate the size by putting labels at the start/end of the part you want, and using mov edx, prog_end - prog_start in the code at the point where you want that size in RDX.
See How does $ work in NASM, exactly? for more about subtracting two labels (in the same section) to get a size. (Where $ is an implicit label at the start of the current line, although $ isn't likely what you want here.)
To get the current address into a register, you need a RIP-relative LEA, not mov, because RIP isn't a general-purpose register and there's no special form of mov that reads it.
here:
lea rsi, [rel here] ; with DEFAULT REL you could just use [here]
mov edi, 1 ; stdout fileno
mov edx, .end - here ; assemble-time constant size calculation
mov eax, 1 ; __NR_write
syscall
.end:
This is fully position-independent, unlike if you used mov esi, here. (How to load address of function or label into register)
The LEA could use lea rsi, [rel $] to assemble to the same machine-code bytes, but you want a label there so you can subtract them.
I optimized your MOV instructions to use 32-bit operand-size, implicitly zero-extending into the full 64-bit RDX and RAX. (And RDI, but write(int fd, void *buf, size_t len) only looks at EDI anyway for the file descriptor).
Note that you can write any bytes of any section; there's nothing special about having a block of code write itself. In the above example, put the start/end labels anywhere. (e.g. foo: and .end:, and mov edx, foo.end - foo taking advantage of how NASM local labels work, by appending to the previous non-local label, so you can reference them from somewhere else. Or just give them both non-dot names.)

Incrementing one to a variable in IA32 Linux Assembly

I'm trying to increment 1 to a variable in IA32 Assembly in Linux
section .data
num: dd 0x1
section .text
global _start
_start:
add dword [num], 1
mov edx, 1
mov ecx, [num]
mov ebx,1
mov eax,4
int 0x80
mov eax,1
int 0x80
Not sure if it's possible to do.
In another literature I saw the follow code:
mov eax, num
inc eax
mov num, eax
Is it possible to increment a value to a var without moving to a register?
If so, do I have any advantage moving the value to a register?
Is it possible to increment a value to a var without moving to a register?
Certainly: inc dword [num].
Like practically all x86 instructions, inc can take either a register or memory operand. See the instruction description at http://felixcloutier.com/x86/inc; the form inc r/m32 indicates that you can give an operand which is either a 32-bit register or 32-bit memory operand (effective address).
If you're interested in micro-optimizations, it turns out that add dword [num], 1 may still be somewhat faster, though one byte larger, on certain CPUs. The specifics are pretty complicated and you can find a very extensive discussion at INC instruction vs ADD 1: Does it matter?. This is partly related to the slight difference in effect between the two, which is that add will set or clear the carry flag according to whether a carry occurs, while inc always leaves the carry flag unchanged.
If so, do I have any advantage moving the value to a register?
No. That would make your code larger and probably slower.

How to convert an 8086 emu assembly program to linux assembly comaptible

I am writing a code to convert hex (A-F) to decimal in assembly. I managed to write it on 8086 emu but I need it for linux. I need help.
The code works absolutely fine on 8086 emulator n windows. But I am unable to convert it into Linux syntax. I am not familiar with the Linux Syntax for assembly.
This is my 8686 code.
org 100h
.model small
.stack 100h
.data
msg1 db 'Enter a hex digit:$'
msg2 db 'In decimal it is:$'
.code
main proc
mov ax,#data
mov ds,ax
lea dx,msg1
mov ah,9
int 21h
mov ah,1
int 21h
mov bl,al
sub bl,17d ; convert to corrosponding hex value
mov ah,2
mov dl,0dh
int 21h
mov dl,0ah
int 21h
lea dx,msg2
mov ah,9
int 21h
mov dl,49d ;print 1 at first
mov ah,2
int 21h
mov dl,bl
mov ah,2 ; print next value of hex after 1
int 21h
main endp
end main
ret
To do such a conversion, you have to consider two things:
Your code is segmented 16-bit assembly code. Linux does not use segmented 16-bit code, but either flat 32-bit or 64-bit code.
"Flat" means that the selectors (cs, ds, es, ss which are not "segment" registers but "selectors" in 32-bit mode) have a pre-defined value which should not be changed.
In 32-bit mode the CPU instructions (and therefore the assembler instructions) are a bit different from 16-bit mode.
Interrupts are environment dependent. int 21h for example is an MS-DOS interrupt, which means that int 21h is only available if the operating system used is compatible to MS-DOS or you use some software (such as "8086 emu") that emulates MS-DOS.
x86 Linux uses int 80h in 32-bit programs to call operating system functions. Unfortunately, many quite "handy" functions of int 21h are not present in Linux. One example would be keyboard input:
If you don't want the default behavior (complete lines are read with echo; the program can read the first character of a line when a complete line has been typed), you'll have to send a so-called ioctl()-code to the system...
And of course the syntax of Linux system calls is different to MS-DOS ones: Function EAX=9 of int 80h (link a file on the disk) is a completely different function than AH=9 of int 21h (print a string on the screen).
You have tagged your question with the tag att. There are however also assemblers for Linux that can assemble intel-style assembly code.

Why does VC++ 2010 often use ebx as a "zero register"?

Yesterday I was looking at some 32 bit code generated by VC++ 2010 (most probably; don't know about the specific options, sorry) and I was intrigued by a curious recurring detail: in many functions, it zeroed out ebx in the prologue, and it always used it like a "zero register" (think $zero on MIPS). In particular, it often:
used it to zero out memory; this is not unusual, as the encoding for a mov mem,imm is 1 to 4 bytes bigger than mov mem,reg (the full immediate value size has to be encoded even for 0), but usually (gcc) the necessary register is zeroed out "on demand", and kept for more useful purposes otherwise;
used it for compares against zero - as in cmp reg,ebx. This is what stroke me as really unusual, as it should be exactly the same as test reg,reg, but adds a dependency to an extra register. Now, keep in mind that this happened in non-leaf functions, with ebx being often pushed (by the callee) on and off the stack, so I would not trust this dependency to be always completely free. Also, it also used test reg,reg in the exact same fashion (test/cmp => jg).
Most importantly, registers on "classic" x86 are a scarce resource, if you start having to spill registers you waste a lot of time for no good reason; why waste one through all the function just to keep a zero in it? (still, thinking about it, I don't remember seeing much register spillage in functions that used this "zero-register" pattern).
So: what am I missing? Is it a compiler blooper or some incredibly smart optimization that was particularly interesting in 2010?
Here's an excerpt:
; standard prologue: ebp/esp, SEH, overflow protection, ... then:
xor ebx, ebx
mov [ebp+4], ebx ; zero out some locals
mov [ebp], ebx
call function_1
xor ecx, ecx ; ebx _not_ used to zero registers
cmp eax, ebx ; ... but used for compares?! why not test eax,eax?
setnz cl ; what? it goes through cl to check if eax is not zero?
cmp ecx, ebx ; still, why not test ecx,ecx?
jnz function_body
push 123456
call throw_something
function_body:
mov edx, [eax]
mov ecx, eax ; it's not like it was interested in ecx anyway...
mov eax, [edx+0Ch]
call eax ; virtual method call; ebx is preserved but possibly pushed/popped
lea esi, [eax+10h]
mov [ebp+0Ch], esi
mov eax, [ebp+10h]
mov ecx, [eax-0Ch]
xor edi, edi ; ugain, registers are zeroed as usual
mov byte ptr [ebp+4], 1
mov [ebp+8], ecx
cmp ecx, ebx ; why not test ecx,ecx?
jg somewhere
label1:
lea eax, [esi-10h]
mov byte ptr [ebp+4], bl ; ok, uses bl to write a zero to memory
lea ecx, [eax+0Ch]
or edx, 0FFFFFFFFh
lock xadd [ecx], edx
dec edx
test edx, edx ; now it's using the regular test reg,reg!
jg somewhere_else
Notice: an earlier version of this question said that it used mov reg,ebx instead of xor ebx,ebx; this was just me not remembering stuff correctly. Sorry if anybody put too much thought trying to understand that.
Everything you commented on as odd looks sub-optimal to me. test eax,eax sets all flags (except AF) the same as cmp against zero, and is preferred for performance and code-size.
On P6 (PPro through Nehalem), reading long-dead registers is bad because it can lead to register-read stalls. P6 cores can only read 2 or 3 not-recently-modified architectural registers from the permanent register file per clock (to fetch operands for the issue stage: the ROB holds operands for uops, unlike on SnB-family where it only holds references to the physical register file).
Since this is from VS2010, Sandybridge wasn't released yet, so it should have put a lot of weight on tuning for Pentium II/III, Pentium-M, Core2, and Nehalem where reading "cold" registers is a possible bottleneck.
IDK if anything like this ever made sense for integer regs, but I don't know much about optimizing for CPUs older than P6.
The cmp / setz / cmp / jnz sequence looks particularly braindead. Maybe it came from a compiler-internal canned sequence for producing a boolean value from something, and it failed to optimize a test of the boolean back into just using the flags directly? That still doesn't explain the use of ebx as a zero-register, which is also completely useless there.
Is it possible that some of that was from inline-asm that returned a boolean integer (using a silly that wanted a zero in a register)?
Or maybe the source code was comparing two unknown values, and it was only after inlining and constant-propagation that it turned into a compare against zero? Which MSVC failed to optimize fully, so it still kept 0 as a constant in a register instead of using test?
(the rest of this was written before the question included code).
Sounds weird, or like a case of CSE / constant-hoisting run amok. i.e. treating 0 like any other constant that you might want to load once and then reg-reg copy throughout the function.
Your analysis of the data-dependency behaviour is correct: moving from a register that was zeroed a while ago essentially starts a new dependency chain.
When gcc wants two zeroed registers, it often xor-zeroes one and then uses a mov or movdqa to copy to the other.
This is sub-optimal on Sandybridge where xor-zeroing doesn't need an execution port, but a possible win on Bulldozer-family where mov can run on the AGU or ALU, but xor-zeroing still needs an ALU port.
For vector moves, it's a clear win on Bulldozer: handled in register rename with no execution unit. But xor-zeroing of an XMM or YMM register still needs an execution port on Bulldozer-family (or two for ymm, so always use xmm with implicit zero-extension).
Still, I don't think that justifies tying up a register for the duration of a whole function, especially not if it costs extra saves/restores. And not for P6-family CPUs where register-read stalls are a thing.

How to limit the address space of 32bit application on 64bit Linux to 3GB?

Is it possible to make 64bit Linux loader to limit the address space of the loaded 32bit program to some upper limit?
Or to set some holes in the address space that to not be allocated by the kernel?
I mean for specific executable, not globally for all processes, neither through kernel configuration. Some code or ELF executable flags are examples of appropriate solution.
The limit should be forced for all loaded shared libraries as well.
Clarification:
The problem I want to fix is that my code uses the numbers above 0xc0000000 as a handle values and I want to clearly distinct between handle values and memory addresses, even when the memory addresses are allocated and returned by some third party library function.
As long as the address space in 64bit Linux is very close to 4G limit, there is no enough addressing space left for the handle values.
On the other hand 3GB or even less is far enough for all my needs.
OK, I found the answer of this question elsewhere.
The solution is to change the "personality" of your program to PER_LINUX32_3GB, using the Linux system call sys_personality.
But there is a problem. After switching to PER_LINUX32_3GB Linux kernel will not allocate space in the upper 1GB, but the already allocated space, for example the application stack, remains there.
The solution is to "restart" your program through sys_execve system call.
Here is the code where I packed everything in one:
proc ___SwitchLinuxTo3GB
begin
cmp esp, $c0000000
jb .finish ; the system is native 32bit
; check the current personality.
mov eax, sys_personality
mov ebx, -1
int $80
; and exit if it is what intended
test eax, ADDR_LIMIT_3GB
jnz .finish ; everything is OK.
; set the needed personality
mov eax, sys_personality
mov ebx, PER_LINUX32_3GB
int $80
; and restart the process
mov eax, [esp+4] ; argument count
mov ebx, [esp+8] ; the filename of the executable.
lea ecx, [esp+8] ; the arguments list.
lea edx, [ecx+4*eax+4] ; the environment list.
mov eax, sys_execve
int $80
; if something gone wrong, it comes here and stops!
int3
.finish:
return
endp

Resources