IA 32 read command line argument - linux

I am trying to read command line arguments with assembly code for IA 32. I found an explanation of how to do it here http://www.paladingrp.com/ia32.shtml. I am able to use the stack pointer to get the number of arguments but I am not able to get the value of the arguments.
Here is what I am trying to do:
movl 8(%esp), %edx # Move pointer to argument 1 to edx
movl (%edx), %ebx # Move value of edx to ebx
movl $1, %eax # opcode for exit system call in eax
int $0x80 # return
Am I getting the correct pointer? If so, how do I get the value of it? If not, how do I get the correct pointer?

movl (%edx), %ebx # Move value of edx to ebx
That doesn't move value of EDX to EBX (the comment is incorrect).
That dereferences pointer in EDX, and puts the result of dereference into EBX. So if you invoked your program with ./a.out foo, then EBX will end up being 0x006f6f66 (== '\0oof' ("foo\0" in little-endian)).
I am guessing that's not what you wanted, but your question is not very clear about what you are expecting to happen where.

Related

Convert a string of digits to an integer by using a subroutine

Assembly language program to read in a (three-or-more-digit) positive integer as a string and convert the string to the actual value of the integer.
Specifically, create a subroutine to read in a number. Treat this as a string, though it will be composed of digits. Also, create a subroutine to convert a string of digits to an integer.
Do not have to test for input where someone thought i8xc was an integer.
I am doing it like this. Please help.
.section .data
String:
.asciz "1234"
Intg:
.long 0
.section .text
.global _start
_start:
movl $1, %edi
movl $String, %ecx
character_push_loop:
cmpb $0, (%ecx)
je conversion_loop
movzx (%ecx), %eax # move byte from (%ecx) to eax
pushl %eax # Push the byte on the stack
incl %ecx # move to next byte
jmp character_push_loop # loop back
conversion_loop:
popl %eax # pop off a character from the stack
subl $48, %eax # convert to integer
imul %edi, %eax # eax = eax*edi
addl %eax, Intg
imul $10, %edi
decl %ecx
cmpl $String, %ecx # check when it get's to the front %ecx == $String
je end # When done jump to end
jmp conversion_loop
end:
pushl Intg
addl $8, %esp # clean up the stack
movl $0, %eax # return zero from program
ret
Also, I am unable to get the output. I am getting a Segmentation Fault. I am not able to find out what is the error in my code.
Proper interaction with operating system is missing.
In the end: you pushed the result but the following addl $8, %esp invalidates the pushed value and the final ret incorrectly leads the instruction flow to whatever garbage was in the memory pointed by SS:ESP+4 at the program entry.
When you increase the stack pointer, you cannot rely that data below ESP will survive.
Your program does not interact with its user, if you want it to print something, use system function to write.
print_String:
mov $4,eax ; System function "sys_write".
mov $1,ebx ; Handle of the standard output (console).
mov $String,ecx ; Pointer to the text string.
mov $4,edx ; Number of bytes to print.
int 0x80 ; Invoke kernel function.
end:mov $1,eax ; System function "sys_exit".
mov (Intg),ebx ; Let your program terminate gracefully with errorlevel Intg.
int 0x80 ; Invoke kernel function.

Loading value at address into register

As a learning exercise, I've been handwriting assembly. I can't seem to figure out how to load the value of an address into a register.
Semantically, I want to do the following:
_start:
# read(0, buffer, 1)
mov $3, %eax # System call 3 is read
mov $0, %ebx # File handle 0 is stdin
mov $buffer, %ecx # Buffer to write to
mov $1, %edx # Length of buffer
int $0x80 # Invoke system call
lea (%ecx, %ecx), %edi # Pull the value at address into %edi
cmp $97, %edi # Compare to 'a'
je done
I've written a higher-level implementation in C:
char buffer[1];
int main()
{
read(0, buffer, 1);
char a = buffer[0];
return (a == 'a') ? 1 : 0;
}
But compiling with gcc -S produces assembly that doesn't port well into my implementation above.
I think lea is the right instruction I should be using to load the value at the given address stored in %ecx into %edi, but upon inspection in gdb, %edi contains a garbage value after this instruction is executed. Is this approach correct?
Instead of the lea instruction, what you need is:
movzbl (%ecx), %edi
That is, zero extending into the edi register the byte at the memory address contained in ecx.
_start:
# read(0, buffer, 1)
mov $3, %eax # System call 3 is read
mov $0, %ebx # File handle 0 is stdin
mov $buffer, %ecx # Buffer to write to
mov $1, %edx # Length of buffer
int $0x80 # Invoke system call
movzbl (%ecx), %edi # Pull the value at address ecx into edi
cmp $97, %edi # Compare to 'a'
je done
Some advice
You don't really need the movz instruction: you don't need a separate load operation, since you can compare the byte in memory pointed by ecx directly with cmp:
cmpb $97, (%ecx)
You may want to specify the character to be compared against (i.e., 'a') as $'a' instead of $97 in order to improve readability:
cmpb $'a', (%ecx)
Avoiding conditional branches is usually a good idea. Immediately after performing the system call, you could use the following code that uses cmov for determining the return value, which is stored in eax, instead of performing a conditional jump (i.e., the je instruction):
xor %eax, %eax # set eax to zero
cmpb $'a', (%ecx) # compare to 'a'
cmovz %edx, %eax # conditionally move edx(=1) into eax
ret # eax is either 0 or 1 at this point
edx was set to 1 prior to the system call. Therefore, this approach above relies on the fact that edx is preserved across the system call (i.e., the int 0x80 instruction).
Even better, you could use sete on al after the comparison instead of the cmov:
xor %eax, %eax # set eax to zero
cmpb $'a', (%ecx) # compare to 'a'
sete %al # conditionally set al
ret # eax is either 0 or 1 at this point
The register al, which was set to zero by means of xor %eax, %eax, will be set to 1 if the ZF flag was set by the cmp (i.e., if the byte pointed by ecx is 'a'). With this approach you don't need to care about thinking whether the syscall preserves edx or not, since the outcome doesn't depend on edx.

can someone explain me the stack of this code?

English it's not my first language so if i spell wrong some words sorry. I've some trouble with the stack, all codes that i will put here works perfectly.
This code for example it's easy and i understand the stack of it.
.globl f
f:
push %ebx
movl 8(%esp), %eax
movl 12(%esp), %ebx
addl %ebx, %eax
ret
STACK
-------
VAR Y --> ESP + 12
-------
VAR X --> ESP + 8
-------
RET --> RETURN
-------
%EBX --> %ESP
-------
But with this code i've some t
.code32
.globl f
f:
pushl %ebx
movl 8(%esp), %ebx
subl $8, %esp # Creo posto nella stack per i parametri
movl $1, (%esp)
movl $2, 4(%esp)
call a
addl %ebx, %eax
addl $8, %esp #Tolgo posto nella stack
popl %ebx
ret
The code work perfectly but i've many question about that?. Where is %ebx and ret on stack now?
Code of asm transalted in c:
int f(int x){
return x + g(y,z);
}
And this is the stack that i've made
STACK
--------
8(%esp) --> x parameter of function f
--------
4(%esp) --> z parameter of function g
--------
(%esp) --> y parameter of funcion g
--------
So the question now is where are %ebx and ret on this stack now?
The first code will return to old ebx value (probably not a valid address), not to the original return address, it's missing pop ebx ahead of ret.
In the second call the memory at ss:esp address, before call a instruction, contains:
dword 1 +0 (current esp)
dword 2 +4
dword old_ebx_value +8
dword return_address_from_f +12
dword x +16
... older stack content ...
Your "esp+x" notation doesn't work, as the esp does change dynamically, so if you want to describe stack like that, you have to say at which position in the code (which value of esp) you are using. Ie. at the entry of f the mov eax,[esp+4] will load the "x", but just one instruction later after push ebx the same thing is achieved by mov eax,[esp+8] (Intel syntax, convert to that "machine" gas/at&t syntax by yourself, I'm human).
But even then, if you will picture it as memory values, it is dynamically changing with every push or write to memory, so you still have to specify at which point of execution you are describing the stack (like I did ahead of call a, because after call a there's the address of instruction addl %ebx, %eax written ahead of that value 1 and code at a is not shown in question.
Anyway the old ebx and return address are in the memory at the same place all the time (unless a overwrites them), it's not the content that moves. It's the pointer esp that is being adjusted by push/pop/add/sub. (the memory content will stay for some undefined period of time even after you pop it, it's just not safe to assume how long it takes other code to overwrite it, in case the SW interrupt handlers are using the app stack it may be overwritten any time, although in x86 32b mode usually the app has it's own stack, so then those values will probably stay there until you overwrite them by next push or call or some other way).
Finally, just make those things to compile, and run them in debugger, put memory view to ss:esp-32 at the beginning, and watch how the memory is being written to by instructions like call or push, and how esp does change to point to the "top of stack". It's usually much easier to "watch it" in debugger, than reading text like my answer.

Assembly: writing multiple lines into a buffer

I have a problem with my second ever program in Assembly. The task is to read multiple lines of text from a keyboard and write them down into a buffer (.comm). After an empty line is entered, program should echo in a loop each previously typed line of text. A limit for one line of text is 100 charcters. However, I get a "program received signal sigsegv segmentation fault / 0x00000000006000a5 in check ()" error message.
My idea is to create a buffer of size 5050 bytes. Each line of text can have at most 100 characters. Here is a visual structure of the buffer:
[First line ][0][Second line ][0][Short ][0][Text ][0]
UPDATE: According to Jester's reply (thanks!), I've slightly modified my idea for the program. I abandoned the idea of 100 bytes per line. I'll simply place them one after another, simply separating them with a special character (0). So a new structure of the buffer would be:
[First line of text][0][No matter how long it is][0][short][0]
However, I've got a problem with appending the special "0" character to the BUFFER in "add_separator" part. I also wonder if it's really necessary since we add the "\n" new line indicator into the BUFFER asswell?
Also, the part when I check if the entered line of code is empty never returns true (empty line state) so the program keeps loading and loading new lines. Did I miss anything?
Here is an updated bit of code:
SYSEXIT = 1
SYSREAD = 3
SYSWRITE = 4
STDOUT = 1
STDIN = 0
EXIT_SUCCESS = 0
.align 32
.data #data section
.comm BUFFER, 5050 #my buffer of a size of 5050 bytes
BUFFER_len = 5050
.global _start
_start:
mov $0,%esi
read:
mov $SYSREAD, %eax
mov $STDIN, %ebx
mov BUFFER(%esi), %ecx
mov $1, %edx
int $0x80
check:
cmp $0, %eax # check if entered line is empty
je end # if yes, end program
lea BUFFER(%esi), %ecx # move the latest character for comparison
cmp '\n', %ecx # check if it's a line end
inc %esi # increment the iterator
je end
jmp read
end:
mov $SYSWRITE, %eax
mov $STDOUT, %ebx
mov $BUFFER, %ecx
mov $BUFFER_len, %edx
int $0x80
mov $SYSEXIT, %eax
mov $EXIT_SUCCESS, %ebx
int $0x80
Thanks in advance for any tips!
Filip
A few things:
Trust me, you don't want to use esp as a general purpose register as a beginner
The read system call will read at most as many bytes as you specify (in this case BUFFER_LEN) and it will return the number of bytes read. You should pass in 1 instead, so you can read char-by-char.
Adding 100 for the next word (you really mean line, right?) isn't terribly useful, just append each character continuously since that's how you want to print them too.
cmp '\n', %al would try to use the '\n' as an address, you want cmp $'\n', %al to use an immediate
Learn to use a debugger to find your own mistakes
using jg to jump over a jle is really unnecessary, just keep the jle and let the execution continue normally otherwise

Bogus Results from Simple Assembly Program on FreeBSD System

I've been having problems getting even the simplest of assembly programs that I write on Linux to run on my FreeBSD machine. Here's the offending code (I'm trying to keep this as simple as possible):
#counts to sixty
.section .data
.section .text
.global _start
_start:
movl $1, %ecx #move $1 into ecx
movl $1, %eax
start_loop:
addl %ecx, %eax #add ecx to eax
cmpl $60, %eax #compare $60 and eax...
je end_loop #if eax = 60 go to end_loop
cmpl $60, %eax #
jle start_loop #jump if eax is < $60...
jmp start_loop #...to start_loop
end_loop:
movl %eax, %ebx #move the value of eax into ebx because ebx holds
#the return value
movb $1, %al #Move $1 into eax (int 1 is the value for the
#exit() syscall
int $0x80
The Linux machine returns the expected resulted which is sixty, whereas the FreeBSD machine consistently returns 164 for the return code. Does anybody know why this is? If so, can you please explain to me what is happening? Also, I should mention that they are both indeed running x86 CPUs. Thanks in advance :)
Refer to the FreeBSD Developer's handbook, and you need to do:
push %eax
mov $1, %eax
push %eax
int $0x80
because:
only the system call vector is passed via register %eax, all arguments are on the stack
the FreeBSD default syscall expects an additional word on the stack, which would be a dummy for inlined uses of int $0x80 but a return address where you do a syscall via a call kernel_entry trampoline (that then can do int $0x80; ret).
If you want to use the Linux convention (some syscall args in regs, called "Alternative Calling convention" in the manual), you have to brand the executable so that the system knows you're using Linux-style syscalls.

Resources