I am writing a code to convert hex (A-F) to decimal in assembly. I managed to write it on 8086 emu but I need it for linux. I need help.
The code works absolutely fine on 8086 emulator n windows. But I am unable to convert it into Linux syntax. I am not familiar with the Linux Syntax for assembly.
This is my 8686 code.
org 100h
.model small
.stack 100h
.data
msg1 db 'Enter a hex digit:$'
msg2 db 'In decimal it is:$'
.code
main proc
mov ax,#data
mov ds,ax
lea dx,msg1
mov ah,9
int 21h
mov ah,1
int 21h
mov bl,al
sub bl,17d ; convert to corrosponding hex value
mov ah,2
mov dl,0dh
int 21h
mov dl,0ah
int 21h
lea dx,msg2
mov ah,9
int 21h
mov dl,49d ;print 1 at first
mov ah,2
int 21h
mov dl,bl
mov ah,2 ; print next value of hex after 1
int 21h
main endp
end main
ret
To do such a conversion, you have to consider two things:
Your code is segmented 16-bit assembly code. Linux does not use segmented 16-bit code, but either flat 32-bit or 64-bit code.
"Flat" means that the selectors (cs, ds, es, ss which are not "segment" registers but "selectors" in 32-bit mode) have a pre-defined value which should not be changed.
In 32-bit mode the CPU instructions (and therefore the assembler instructions) are a bit different from 16-bit mode.
Interrupts are environment dependent. int 21h for example is an MS-DOS interrupt, which means that int 21h is only available if the operating system used is compatible to MS-DOS or you use some software (such as "8086 emu") that emulates MS-DOS.
x86 Linux uses int 80h in 32-bit programs to call operating system functions. Unfortunately, many quite "handy" functions of int 21h are not present in Linux. One example would be keyboard input:
If you don't want the default behavior (complete lines are read with echo; the program can read the first character of a line when a complete line has been typed), you'll have to send a so-called ioctl()-code to the system...
And of course the syntax of Linux system calls is different to MS-DOS ones: Function EAX=9 of int 80h (link a file on the disk) is a completely different function than AH=9 of int 21h (print a string on the screen).
You have tagged your question with the tag att. There are however also assemblers for Linux that can assemble intel-style assembly code.
Related
I'm trying to make a DOS program in NASM that uses interrupt 10h to display a pixel cycling through the 16 available colors in the top left corner. I also use interrupt 21h to only make the program run every 1/100 seconds (100 fps).
segment .data
pixelcolor: db 0
pixelx: dw 100
pixely: dw 100
timeaux: db 0 ; used later on to force the program to run at 100fps
segment .text
global _start
_start:
mov ah,00h
mov al,0dh
int 10h
mov ah,0bh
mov bh,00h
mov bl,00h
int 10h
.infinite:
mov ah,2ch
int 21h ; get system time
cmp dl,timeaux ; if 1/100 seconds haven't passed yet...
je .infinite ; ...skip current frame
; else, continue normally
mov byte[timeaux],dl
mov ah,00h
mov al,0dh
int 10h
mov ah,0bh
mov bh,00h
mov bl,00h
int 10h
mov ah,0ch
mov al,pixelcolor
mov cx,pixelx
mov dx,pixely
int 10h
inc byte[pixelcolor]
jmp .infinite
However, when I actually run the program in DOSBox, the pixel just stays red. Does anyone know why my infinite loops aren't working? (Note: I'm very new to NASM, so honestly I'm not even suprised my programs only work 15% of the time.)
The problem isn't actually the loop itself. What the loop is doing each iteration is the problem. Some issues and observations I have are:
Since this is a DOS COM program you will need an org 100h at the top since a COM program is loaded by the DOS loader to offset 100h of the current program segment. Without this the offsets of your data will be incorrect leading to data being read/written to from the wrong memory locations.
You have a problem with mov al,pixelcolor. It needs to be mov al,[pixelcolor]. Without square brackets1 the offset of pixelcolor is moved to AL, not what is stored at offset of pixelcolor. The same goes for pixelx and pixely. Your code prints the same pixel color (red in your case) to the wrong place2 on the screen repeatedly. This code:
mov ah,0ch
mov al,pixelcolor
mov cx,pixelx
mov dx,pixely
int 10h
inc byte[pixelcolor]
should be:
mov ah,0ch
mov al,[pixelcolor]
mov cx,[pixelx]
mov dx,[pixely]
int 10h
inc byte[pixelcolor]
It should be noted that the resolution of the timer by default will only be 18.2 times a second (~55ms). This is less resolution than the 1/100 of a second you are aiming for.
Some versions of DOS may always return 0 for the 1/100 of a second value.
Use of the BIOS to write pixels to the screen may make coding simpler (it abstracts away differences in the video modes) but will be quite slow compared to writing pixels directly to memory.
I would recommend Borland's Turbo Debugger (TD) for debugging DOS software. Turbo Debugger is included in a number of Borland's DOS C/C++ compiler suites.
Footnotes
1The use of brackets [] in NASM differs from MASM/TASM/JWASM.
2Although your question says you want to write to the upper left of the screen, the code suggests you really intended to write the pixel at coordinate 100,100.
section .text
global _start ;must be declared for using gcc
_start: ;tell linker entry point
mov edx, len ;message length
mov ecx, msg ;message to write
mov ebx, 1 ;file descriptor (stdout)
mov eax, 4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax, 1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!',0xa ;our dear string
len equ $ - msg ;length of our dear string
This is a basic 32-bit x86 Linux assembly code to print "Hello, World!" on the screen (standard output). Build + run it with
nasm -felf -g -Fdwarf hello.asm
gcc -g -m32 -nostdlib -static -o hello hello.o
./hello
(Editor's note: or gdb ./hello to debug / single-step it. That's why we used nasm -g -Fdwarf and gcc -g. Or use layout reg inside GDB for disassembly+register view that doesn't depend on debug symbols. See the bottom of https://stackoverflow.com/tags/x86/info)
Now I want to ask about how is this code working behind the scenes. Like what is the need for all these instructions
_start: ;tell linker entry point
mov edx, len ;message length
mov ecx, msg ;message to write
mov ebx, 1 ;file descriptor (stdout)
mov eax, 4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax, 1 ;system call number (sys_exit)
int 0x80 ;call kernel
just to print "Hello, World!" and the statement
_start:
above! Is it the main function?
and the statement
int 0x80
why is it used at all? Can you guys give me a deep explaination of the basic working of this program.
In machine code, there are no functions. At least, the processor knows nothing about functions. The programmer can structure his code as he likes. _start is something called a symbol which is just a name for a location in your program. Symbols are used to refer to locations whose address you don't know yet. They are resolved during linking. The symbol _start is used as the entry point (cf. this answer) which is where the operating system jumps to start your program. Unless you specify the entry point by some other way, every program must contain _start. The other symbols your program uses are msg, which is resolved by the linker to the address where the string Hello, world! resides and len which is the length of msg.
The rest of the program does the following things:
Set up the registers for the system call write(1, msg, len). write has system call number 4 which is stored in eax to let the operating system know you want system call 4. This system call writes data to a file. The file descriptor number supplied is 1 which stands for standard output.
Perform a system call using int $0x80. This instruction interrupts your program, the operating system picks this up and performs the function whose number is stored in eax. It's like a function call that calls into the OS kernel. The calling convention is different from other functions, with args passed in registers.
Set up the registers for the system call _exit(?). Its system call number is 1 which goes into eax. Sadly, the code forgets to set the argument for _exit, which should be 0 to indicate success. Instead, whatever was in ebx before is used instead, which seems to be 1.
Perform a system call using int $0x80. Because _exit ends the program, it does not return. Your program ends here.
The directive db tells the assembler to place the following data into the program where we currently are. This places the string Hello, world! followed by a newline into the program so we can tell the write system call to write that string.
The line len equ $ - msg tells the assembler than len is the difference between $ (where we currently are) and msg. This is defined so we can pass to write how long the text we want to print is.
Everything after a semicolon (;) in the program is a comment ignored by the assembler.
For a task I need to create simple shellcode, but it is not allowed that it contains \x80.
Notice: To make a system call on linux, like write or exit, you need among others this line: int 0x80, which in the end will produce shellcode including \x80.
Nevertheless I need to make system calls, so my idea now is to use a variable for the interrupt vector number. For example 0x40 and then multiply it with 2, so in the end there will be a \x40 but not a \x80 in the shellcode.
The problem is that the int is not taking a variable as an argument, I tried this for a test:
section .data
nr db 0x80
section .text
global _start
_start:
xor eax, eax
inc eax
xor ebx, ebx
mov ebx, 0x1
int [nr]
And get
error: invalid combination of opcode and operands
How could I get my idea working? Or do you have a different solution for the problem?
PS. sysenter and syscall are not working -> Illegal instruction
I am using nasm on a x86-32bit machine.
maybe something like this, but never use it in serious code!
format ELF executable
use32
entry start
segment executable writeable
start:
;<some code>
inc byte [ here + 1 ] ;<or some other math>
jmp here
here:
int 0x7f
segment readable writeable
(this is fasm-code)
I am confused towards why/how a value gets printed in x86 assembly in a Linux environment.
For example if I wish to print a value I would do this:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx msgLength
int 80h
Now I understand the numerical value 4 will make the system call to sys_write after the interrupt. But my question is, what is the significance of the 4? Is it loading the address of the decimal value 4 into eax? Or is it loading the value 4 into the eax register?
I am confused after reading I can transfer the value at an address to a register using the following instruction:
mov eax, [msg]
eax will now contain the bytes at the address of msg, but I would guess this format is not acceptable:
mov eax, [4]
So what is really happening when I move 4 into eax to print something?
Simply the value (number) 4 is loaded into eax, no magic there. The operating system will look at the value in eax to figure out what function you want. System call number is a code that identifies the various available kernel functions you can use.
Linux kernel maintains all the system call routines as an array of function pointers (can be called as sys_call table) and the value in the eax gives the index to that array (which system call to choose) by the kernel. Other registers like ebx, ecx, edx contains the appropriate parameters for that system call routine.
And the int 80h is for software interrupt to the cpu from user mode to kernel mode because actual system call routine is kernel space function.
I wrote a simple assembly program that gets two integers from the user via a prompt, multiplies them together and prints that out. I wanted to do this directly with sys_read and not scanf so I could manually convert the input to an integer after removing the LF.
Here's the full source: http://pastebin.com/utnjTvNZ
In particular, what I want to do now is manually add a newline to the result of the multiplication that is now converted back to it's ASCII char equivalent. Initially, I thought I could just left shift 16 bits and add 0xA leaving me with, for example, 0x0034000A on the stack for 2*2 (0x0034 is "4" in ASCII chars), followed by a null terminator and a LF. However, the LF is printing before the result. I figured this was an endianess thing, so I tried the reverse (0x000A0034) and that just printed some other ASCII char instead.
So, simply, how do I properly push a newline to the stack so that this is printed with a newline following the number when using sys_write? What I'm missing is how strings are stored on the stack... which I can't test because normally you just create a variable and push the address onto the stack.
I'm aware some things in here could be done better, cleaner and up-to-standards and whatnot. I understand things intuitively so it's something I just need to do to better understand the stack and Linux system calls in general.
Okay, so to answer my own question thanks to the help of Jester, to add a newline to the 32-bit word I was displaying in memory, I had to understand endianness. Since I compiled for 32-bit, my program is functioning on 32-bit words. These words' bytes are written into memory "backwards". The words themselves are still stored in "normal" order. For example 0x0A290028 0x0A293928 prints (NULL)LF(9)LF. The bytes are backwards but the words are not. Sys_write, since it just uses a void *buf and isn't aware of strings, just reads bytes in endian-order from the buffer and spits them out.
What I was able to do was simply left-shift my single-digit product, for example, 0x00000034 by 8-bits. This left me with 0x00003400. To that, I could add 0x000A0000. This would result in 0x000A3400, and the number "4" being printed followed by a newline.
So, the new procedure looks like this:
multprint:
mov eax, sys_write ;4
mov ebx, stdout ;1
mov ecx, resultstr
mov edx, resultstrLen
dec edx
int 0x80
pop eax ;multiplican't
pop ebx ;multiplicand
mul ebx
add eax, '0'
shl eax, 8 ;make room for () and LF
add eax, 0x0A290028
push eax
mov ecx, esp
;mov [num], eax ;use these two lines if I don't want to use the stack
;mov ecx, num
mov eax, sys_write
mov ebx, stdout
mov edx, 4
int 0x80