Problems with ATOI in x86 NASM Linux assembly

Problems with ATOI in x86 NASM Linux assembly - linux

I don't understand how to convert a string to an integer.
This is for homework, but I do not want answers to the problem -- (AKA Correct code). I'd really appreciate it if someone could explain just what it is that i'm doing wrong! :(
Thanks in advance!!!
I'm running Ubuntu 12.04 on a virtual machine, 32 bit.
I compile with:
nasm -f elf proj2.asm
I link with:
gcc -o proj2 proj2.o
and then run it:
./proj2
It displays the first number, but then gives me a segmentation fault when I try to use atoi.
I have a teacher who wants us to:
read in numbers from a text file arranged as so:
4
5
4
2
9
(there is whitespace before each integer)
As per his instructions: "Be sure to read seven (7) characters into the buffer to get the entire line. These are the five characters representing the number together with characters CR and LF. CR is the Carriage Return character with hex code 0x0D and LF is the Line Feed character with hex code 0x0A.")
I've erased the spaces from the file, and tried to read it that way, but it didn't help.
The ints are to be read, into an array on the stack, with a maximum number of ints of 250. That's not the problem though :/
Below is my code so far.
BUFFERSIZE equ 10
section .data
file_name: db "/home/r/Documents/CS/project2/source/indata.txt", 0x00
file_mode: db "r", 0x00
output: db "%i",0xa
test: db "hello world",10
format: db "%u"
numToRead: db 1
temp: db "hi"
num:db "1",0,0
section .bss
fd: resd 4
length: resd 4
buffer resb BUFFERSIZE
;i was trying to use buffers and just
;read through each character in the string,
;but i couldn't get it to work
section .text
extern fopen
extern atoi
extern printf
extern fscanf
extern fgets
extern getc
extern fclose
global main
main:
;setting up stack frame
push ebp
mov ebp, esp
;opens file, store FD to eax
push file_mode
push file_name
call fopen
;save FD from eax into fd
push eax
mov ebx,eax
mov [fd],ebx
;ebx holds the file descriptor
;push in reverse order
push ebx
push numToRead
push temp
call fgets
push eax
call printf ;prints length (this works, i get a 4.
;Changing the number in the file changes the number displayed.
;I can also read in several lines, just can't get any ints!
;(So i can't do any comparisons or loops :/ )
;i shouldn't need to push eax here, right?
;It's already at the top of the stack from the printf
;pop eax
;push eax
call atoi
;calling atoi gives me a segmentation fault error
push eax
call printf
mov esp,ebp
pop ebp
ret
edit:
Interestingly, I can call atoi just fine. It's when i then try to
push eax
call atoi
push eax
call printf
that i get segmentation faults.

unless I cannot see it on my cellphone, but your not balancing the stack after your calls. those c functions are not stdcall so you have to adjust the stack after each call. I do:
add esp, 4 * numofpushes that might be the source of your seg faults.

edit: Interestingly, I can call atoi just fine. It's when i then try to
push eax
call atoi
push eax
call printf
that i get segmentation faults.
From the atoi reference: "On success, the function returns the converted integral number as an int value.".
Passing any random integer (like 4) as the first argument of the following printf (i.e. the format string pointer) is not likely to end well.

Related

Why is the RDI register missing in this "Hello world" assembly program?

I found this "Hello" (shellcode) assembly program:
SECTION .data
SECTION .text
global main
main:
mov rax, 1
mov rsi, 0x6f6c6c6548 ; "Hello" is stored in reverse order "olleH"
push rsi
mov rsi, rsp
mov rdx, 5
syscall
mov rax, 60
syscall
And I found that mov rdi, 1 is missing. In other "hello world" programs that instruction appears so I would like to understand why this happens.

I was going to say it's an intentional trick or hack to save code bytes, using argc as the file descriptor. (1 if you run it from the shell without extra command line args). main(int argc, char**argv) gets its args in EDI and RSI respectively, in the x86-64 SysV calling convention used on Linux.
But given the other choices, like mov rax, 1 instead of mov eax, edi, it's probably just a bug that got overlooked because the code happened to work.
It would not work in real shellcode for a code-injection attack, where execution would probably reach this code with garbage other than 0, 1, or 2 in EDI. The shellcode test program on the tutorial you linked calls a const char[] of machine code as the only thing in main, which will normally compile to asm that doesn't touch RDI.
This code wouldn't work for code-injection attacks based on strcpy or other C-string overflows either, since the machine code contains 00 bytes as part of mov eax, 1, mov edx, 5, and the end of that character string.
Also, modern linkers don't link .rodata into an executable segment, and -zexecstack only affects the actual stack, not all readable memory. So that shellcode test won't work, although I expect it did when written. See How to get c code to execute hex machine code? for working ways, like using a local array and compiling with -zexecstack.
That tutorial is overall not great, probably something this guy wrote while learning. (But not as bad as I expected based on this bug and the use of Kali; it's at least decently written, just missing some tricks.)
Since you're using NASM, you don't need to manually waste time looking up ASCII codes and getting the byte order correct. Unlike some assemblers, mov rsi, "Hello" / push rsi results in those bytes being in memory in source order.
You also don't need an empty .data section, especially when making shellcode which is just a self-contained snippet of machine code which can't reference anything outside itself.
Writing a 32-bit register implicitly zero-extends to 64-bit. NASM optimizes mov rax,1 into mov eax,1 for you (as you can see in the objdump -d AT&D disassembly; objdump -drwC -Mintel to use Intel-syntax disassembly similar to NASM.)
The following should work:
global main
main:
mov rax, `Hello\n ` ; non-zero padding to fill 8 bytes
push rax
mov rsi, rsp
push 1 ; push imm8
pop rax ; __NR_write
mov edi, eax ; STDOUT_FD is also 1
lea edx, [rax-1 + 6] ; EDX = 6; using 3 bytes with no zeros
syscall
mov al, 60 ; assuming write success, RAX = 5, zero outside the low byte
;lea eax, [rdi-1 + 60] ; the safe way that works even with ./hello >&- to return -EBADF
syscall
This is fewer bytes of machine code than the original, and avoids \x00 bytes which strcpy would stop on. I changed the string to end with a newline, using NASM backticks to support C-style escape sequences like \n as 0x0a byte.
Running normally (I linked it into a static executable without CRT, despite it being called main instead of _start. ld foo.o -o foo):
$ strace ./foo > /dev/null
execve("./foo", ["./foo"], 0x7ffecdc70a20 /* 54 vars */) = 0
write(1, "Hello\n", 6) = 6
exit(1) = ?
Running with stdout closed to break the mov al, 60 __NR_exit hack:
$ strace ./foo >&-
execve("./foo", ["./foo"], 0x7ffe3d24a240 /* 54 vars */) = 0
write(1, "Hello\n", 6) = -1 EBADF (Bad file descriptor)
syscall_0xffffffffffffff3c(0x1, 0x7ffd0b37a988, 0x6, 0, 0, 0) = -1 ENOSYS (Function not implemented)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffffffffffda} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
To still exit cleanly, use lea eax, [rdi-1 + 60] (3 bytes) instead of mov al, 60 (2 bytes) to set RAX according to the unmodified EDI, instead of depending on the upper bytes of RAX being zero which they aren't after an error return.
See also https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code

How do I handle the stack pointer register when returning from an assembly function call to a C program?

My program is composed of two files: main.c and core.s and runs on a 32 bit virtual machine of lubuntu linux.
Main.c takes in an integer and passes it to the assembly function void printFunc(int x). The assembly function in turn calls to a C function to check the parity of x. If x is even the function will print 4x and if x is odd it will print 8x. The print call must be done within the assembly function.
section .text
global printFunc
extern c_checkValidity
extern printf
section .data ; data section
fmt: db "%d", 10, 0 ; The printf format, "\n", '0'
printFunc:
push ebp ; code for handling stack I've seen from
mov ebp, esp ; other examples online
pushad
mov ebx, eax ; copy and input value
push eax ; Move input onto stack
call c_checkValidity ; Call C function, return value is in eax
cmp eax, 1 ; Check result, 1 indicates even
je multby4 ; If even, do mult by 4
jmp multby8 ; Otherwise odd, do mult by 8
multby4: ;INPUT WAS EVEN
sal ebx, 2 ; left shift by 2 is equivalent to multiplying by 4
jmp exitcode ; print and exit code
multby8: ;INPUT WAS ODD
sal ebx, 3 ; left shift by 3 is equivalent to multiplying by 8
jmp exitcode ; print and exit code
exitcode:
mov eax,ebx ; move value to eax to keep as default return value of func
push ebx ; Push final answer to the stack
push dword fmt ; Push print format to the stack
call printf ; Print answer
mov eax, ebx ; Copy final answer as return value
popad
mov esp, ebp ; return stack pointer to what it was before operation
pop ebp ; get rid of saved pointer
ret ; return state to caller
The integer input is received, parity tested, and printed to stdout correctly. A segfault occurs somewhere after call printf has successfully executed. When I use gdb to try and backtrace the segfault the report says "0x0804a0f in exitcode ()". Presumably this is the address of the code during operation that causes the segfault?
It is clear to me that I have failed to properly handle the stack pointer register (esp?) in some way. I've tried searching this site and others for examples of how to properly address the stack before returning control to the caller but to no avail. Certainly I would love to make the code work and any advice on how to fix the code is appreciated but I am primarily asking for an explanation on what I should be tracking and maintaining in general to return from an assembly function to a caller (extra appreciation if that explanation includes how to pass back values to the caller).

Scan an integer and print the interval (1, integer) in NASM

I am trying to learn the assembly language from a Linux Ubuntu 16.04 x64.
For now I have the following problem:
- scan an integer n and print the numbers from 1 to n.
For n = 5 I should have 1 2 3 4 5.
I tried to do it with scanf and printf but after I input the number, it exits.
The code is:
;nasm -felf64 code.asm && gcc code.o && ./a.out
SECTION .data
message1: db "Enter the number: ",0
message1Len: equ $-message1
message2: db "The numbers are:", 0
formatin: db "%d",0
formatout: db "%d",10,0 ; newline, nul
integer: times 4 db 0 ; 32-bits integer = 4 bytes
SECTION .text
global main
extern scanf
extern printf
main:
mov eax, 4
mov ebx, 1
mov ecx, message1
mov edx, message1Len
int 80h
mov rdi, formatin
mov rsi, integer
mov al, 0
call scanf
int 80h
mov rax, integer
loop:
push rax
push formatout
call printf
add esp, 8
dec rax
jnz loop
mov rax,0
ret
I am aware that in this loop I would have the inverse output (5 4 3 2 1 0), but I did not know how to set the condition.
The command I'm using is the following:
nasm -felf64 code.asm && gcc code.o && ./a.out
Can you please help me find where I'm going wrong?

There are several problems:
1. The parameters to printf, as discussed in the comments. In x86-64, the first few parameters are passed in registers.
2. printf does not preserve the value of eax.
3. The stack is misaligned.
4. rbx is used without saving the caller's value.
5. The address of integer is being loaded instead of its value.
6. Since printf is a varargs function, eax needs to be set to 0 before the call.
7. Spurious int 80h after the call to scanf.
I'll repeat the entire function in order to show the necessary changes in context.
main:
push rbx ; This fixes problems 3 and 4.
mov eax, 4
mov ebx, 1
mov ecx, message1
mov edx, message1Len
int 80h
mov rdi, formatin
mov rsi, integer
mov al, 0
call scanf
mov ebx, [integer] ; fix problems 2 and 5
loop:
mov rdi, formatout ; fix problem 1
mov esi, ebx
xor eax, eax ; fix problem 6
call printf
dec ebx
jnz loop
pop rbx ; restore caller's value
mov rax,0
ret
P.S. To make it count up instead of down, change the loop like this:
mov ebx, 1
loop:
<call printf>
inc ebx
cmp ebx, [integer]
jle loop

You are calling scanf correctly, using the x86-64 System V calling convention. It leaves its return value in eax. After successful conversion of one operand (%d), it returns with eax = 1.
... correct setup for scanf, including zeroing AL.
call scanf ; correct
int 80h ; insane: system call with eax = scanf return value
Then you run int 80h, which makes a 32-bit legacy-ABI system call using eax=1 as the code to determine which system call. (see What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?).
eax=1 / int 80h is sys_exit on Linux. (unistd_32.h has __NR_exit = 1). Use a debugger; that would have shown you which instruction was making your program exit.
Your title (before I corrected it) said you got a segmentation fault, but I tested on my x86-64 desktop and that's not the case. It exits cleanly using an int 80h exit system call. (But in code that does segfault, use a debugger to find out which instruction.) strace decodes int 0x80 system calls incorrectly in 64-bit processes, using the 64-bit syscall call numbers from unistd_64.h, not the 32-bit unistd_32.h call numbers.
Your code was close to working: you use the int 0x80 32-bit ABI correctly for sys_write, and only pass it 32-bit args. (The pointer args fit in 32 bits because static code/data is always placed in the low 2GiB of virtual address space in the default code model on x86-64. Exactly for this reason, so you can use compact instructions like mov edi, formatin to put addresses in registers, or use them as immediates or rel32 signed displacements.)
OTOH I think you were doing that for the wrong reason. And as #prl points out, you forgot to maintain 16-byte stack alignment.
Also, mixing system calls with C stdio functions is usually a bad idea. Stdio uses internal buffers instead of always making a system call on every function call, so things can appear out of order, or a read can be waiting for user input when there's already data in the stdio buffer for stdin.
Your loop is broken in several ways, too. You seem to be trying to call printf with the 32-bit calling convention (args on the stack).
Even in 32-bit code, this is broken, because printf's return vale is in eax. So your loop is infinite, because printf returns the number of characters printed. That's at least two from the %d\n format string, so dec rax / jnz will always jump.
In the x86-64 SysV ABI, you need to zero al before calling printf (with xor eax,eax), if you didn't pass any FP args in XMM registers. You also have to pass args in rdi, rsi, ..., like for scanf.
You also add rsp, 8 after pushing two 8-byte values, so the stack grows forever. (But you never return, so the eventual segfault will be on stack overflow, not on trying to return with rsp not pointing to the return address.)
Decide whether you're making 32-bit or 64-bit code, and only copy/paste from examples for the mode and OS you're targeting. (Note that 64-bit code can and often does use mostly 32-bit registers, though.)
See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) (which does include a NASM section with a handy asm-link script that assembles and links into a static binary). But since you're writing main instead of _start and are using libc functions, you should just link with gcc -m32 (if you decide to use 32-bit code instead of replacing the 32-bit parts of your program with 64-bit function-calling and system-call conventions).
See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.

Why is data stored in memory reversed?

This is the source code I have:
section .data
msg: db "pppaaa"
len: equ $
section .text
global main
main:
mov edx,len
mov ecx,msg
mov ebx,1
mov eax,4
int 0x80
And when I debug this code I will see:
(gdb) info register ecx
ecx 0x804a010 134520848
(gdb) x 0x804a010
0x804a010 <msg>: 0x61707070
(gdb) x 0x804a014
0x804a014: 0x00006161
"70" here represents the character 'p' and "61" the character 'a' obviously.
What I am confused about is, why is the data in location 0x804a010 is 0x61707070 (appp) and moving 4 bytes forward at 0x804a014 the data is --aa ?
I would expect to see (pppa) for the first location and (aa--) for the second location. Why is this the case?

GDB doesn't know that you have a bunch of chars. You are just asking it to look at a memory location and it is displaying what is there, defaulting to a 4-byte integer. It assumes the integer is stored least significant byte first, because that is how it is done on Intel, so you get your bytes reversed.
To fix this, use a format specifier with your x command, like this:
x/10c 0x804a010
(will print 10 chars beginning at 0x804a010).
help x in GDB will give more information.

Reading to and from arrays in Assembly?

I'm having a bit of trouble reading to and from arrays in assembly.
It's a fairly simple program (albeit at this point, far from finished). All I'm trying to do at this point is read a string of (what we're assuming is numbers), converting it to a decimal number, and printing it. Here's what I've got so far. As of now, it prints str1. After you enter a number and hit enter, it prints str1 again and freezes. Can anyone offer some insight as to what all I'm doing wrong?
INCLUDE Irvine32.inc
.data
buffersize equ 80
buffer DWORD buffersize DUP (0)
str1 BYTE "Enter numbers to be added together. Press (Q) to Quit.", 0dh, 0ah,0;
str2 BYTE "The numbers entered were: ", 0dh, 0ah, 0
str3 BYTE "The total of numbers entered is: ", 0dh, 0ah, 0
error BYTE "Invalid Entry. Please try again.", 0dh, 0ah,0
value DWORD 0
.code
main PROC
mov edx, OFFSET str1
call Writestring
Input:
call readstring
mov buffer[edi], eax
cmp buffer[edi], 0
JL NOTDIGIT
cmp buffer[edi], 9
JG NOTDIGIT
call cvtDec
mov edx, buffer[edi]
call WriteString
jmp endloop
Notdigit:
mov edx, OFFSET error
call writestring
exit
cvtDec:
mov eax, buffer[edi]
AND eax,0Fh
mov buffer[edi],edx
ret
endloop:
main ENDP
END MAIN

First off, Mr. Irvine created the function called WriteString, but you use 2 variations - writestring and Writestring; you do use the correct case of the function in one place. Get into the habit of using the correct names of functions now, and it will cut down on bugs later.
Second, you created a label called Notdigit but yet you use JL NOTDIGIT and JG NOTDIGIT in your code. Again, use the correct spelling. MASM should of given you an A2006 error "undefined symbol"
You also declared your entry point as main, but you close your code section with END MAIN instead of END main.
If you have MASM set up properly (by adding option casemap:none at the top of your source. Or just open irvine32.inc and uncomment the line that says OPTION CASEMAP:NONE)
Let's look at the ReadString procedure comment in irvine32.asm:
; Reads a string from the keyboard and places the characters
; in a buffer.
; Receives: EDX offset of the input buffer
; ECX = maximum characters to input (including terminal null)
; Returns: EAX = size of the input string.
; Comments: Stops when Enter key (0Dh,0Ah) is pressed. If the user
; types more characters than (ECX-1), the excess characters
; are ignored.
ReadString takes an address of the buffer to hold the inputed string in edx, you are using the address of your prompt str1, maybe you meant to use buffer? You also did not put the size of the buffer into ecx
Your using edi as an index into your buffer, what value does edi contain? Your trying to put the value of eax into it, what does eax contain??? Both edi and eax probably contain garbage; not what you want.
Look at this carefully:
cvtDec:
mov eax, buffer[edi]
AND eax,0Fh
mov buffer[edi],edx
Your putting a value (That you think is an ASCII value of a number) into eax then converting to a decimal value... ok... Next, you are putting whatever is in edx back into your buffer. Is that what you want?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string