Why is data stored in memory reversed? - linux

This is the source code I have:
section .data
msg: db "pppaaa"
len: equ $
section .text
global main
main:
mov edx,len
mov ecx,msg
mov ebx,1
mov eax,4
int 0x80
And when I debug this code I will see:
(gdb) info register ecx
ecx 0x804a010 134520848
(gdb) x 0x804a010
0x804a010 <msg>: 0x61707070
(gdb) x 0x804a014
0x804a014: 0x00006161
"70" here represents the character 'p' and "61" the character 'a' obviously.
What I am confused about is, why is the data in location 0x804a010 is 0x61707070 (appp) and moving 4 bytes forward at 0x804a014 the data is --aa ?
I would expect to see (pppa) for the first location and (aa--) for the second location. Why is this the case?

GDB doesn't know that you have a bunch of chars. You are just asking it to look at a memory location and it is displaying what is there, defaulting to a 4-byte integer. It assumes the integer is stored least significant byte first, because that is how it is done on Intel, so you get your bytes reversed.
To fix this, use a format specifier with your x command, like this:
x/10c 0x804a010
(will print 10 chars beginning at 0x804a010).
help x in GDB will give more information.

Related

Getting confused about usage of labels in assembly for X86_64 linux: Why should we write mov [digit], al, but not mov digit, al?

Here's my code:
section .data
digit db 0,10
section .text
global _start
_start:
call _printRAXDigit
mov rax, 60
mov rdx, 0
syscall
_printRAXDigit:
add rax, 48
mov [digit], al
mov rax, 1
mov rdi, 1
mov rsi, digit
mov rdx, 2
syscall
ret
I have a question about the difference between [digit] and digit.
I have learned that labels (like digit in the code), represent the memory address of the data, and the operator "[]" acts like something to dereference the pointer, so it will load the value that the label points at to the destination.
For instance, mov rax, [digit] will throw 0 to the rax register because digit points at the first element of the data (in this case, the integer 0).
However, in my code, it works when I write mov [digit], al, which means "load the value stored in al to the memory address digit", but I have no idea why we should use "[]" in this case. The first argument of mov must be a destination (like a register or a memory address), so I think it should be mov digit, al rather than mov [digit], al. It doesn't make sense to me why we use a value to get the value from another place rather than use a memory address to get the value.
So that's all of my question. Please give me any response about where my thinking is wrong or any correction about my concept of labels.
In NASM syntax (there are assemblers which use different notation, e.g. MASM/TASM use a different flavor of Intel syntax, and gas uses AT&T syntax) the following x86 instructions ...
mov esi, someAddress
mov esi, [someAddress]
mov [someAddress], esi
mov someAddress, esi ; see below
... (would) have the following meaning:
mov esi, someAddress
Write the number that represents the address where someAddress is stored to the register esi. So if someAddress is stored at address 1234 the value 1234 is written to esi.
mov esi, [someAddress]
Write the content of the memory to esi. So if someAddress is stored at address 1234 and the value stored at address 1234 is 5678 the value 5678 is written to esi.
You might also say: The value of the variable someAddress (a variable normally is nothing but the content of the memory at a certain address) is written to the esi register.
mov [someAddress], esi
Write the content of esi to the memory at address someAddress.
You might also say: Write the value of esi to the variable someAddress.
mov someAddress, esi
Would mean: Change the constant number which represents the address someAddress to esi.
So if someAddress is located at address 1234 and esi contains the value 5678 the instruction would mean:
Change the mathematical constant 1234 in a way that 1234 = 5678 after that change.
This is of course stupid because the mathematical constants 1234 and 5678 will never be equal. For this reason the x86 CPU has no such instruction.
(There are CPUs having similar instructions. On the SPARC CPUs for example instructions assigning a value to the zero register (which means: "assign a value to the constant zero") are used if you only want to have the instruction's side effects - like setting the flags - but you are not interested in the result itself.)

Reading to and from arrays in Assembly?

I'm having a bit of trouble reading to and from arrays in assembly.
It's a fairly simple program (albeit at this point, far from finished). All I'm trying to do at this point is read a string of (what we're assuming is numbers), converting it to a decimal number, and printing it. Here's what I've got so far. As of now, it prints str1. After you enter a number and hit enter, it prints str1 again and freezes. Can anyone offer some insight as to what all I'm doing wrong?
INCLUDE Irvine32.inc
.data
buffersize equ 80
buffer DWORD buffersize DUP (0)
str1 BYTE "Enter numbers to be added together. Press (Q) to Quit.", 0dh, 0ah,0;
str2 BYTE "The numbers entered were: ", 0dh, 0ah, 0
str3 BYTE "The total of numbers entered is: ", 0dh, 0ah, 0
error BYTE "Invalid Entry. Please try again.", 0dh, 0ah,0
value DWORD 0
.code
main PROC
mov edx, OFFSET str1
call Writestring
Input:
call readstring
mov buffer[edi], eax
cmp buffer[edi], 0
JL NOTDIGIT
cmp buffer[edi], 9
JG NOTDIGIT
call cvtDec
mov edx, buffer[edi]
call WriteString
jmp endloop
Notdigit:
mov edx, OFFSET error
call writestring
exit
cvtDec:
mov eax, buffer[edi]
AND eax,0Fh
mov buffer[edi],edx
ret
endloop:
main ENDP
END MAIN
First off, Mr. Irvine created the function called WriteString, but you use 2 variations - writestring and Writestring; you do use the correct case of the function in one place. Get into the habit of using the correct names of functions now, and it will cut down on bugs later.
Second, you created a label called Notdigit but yet you use JL NOTDIGIT and JG NOTDIGIT in your code. Again, use the correct spelling. MASM should of given you an A2006 error "undefined symbol"
You also declared your entry point as main, but you close your code section with END MAIN instead of END main.
If you have MASM set up properly (by adding option casemap:none at the top of your source. Or just open irvine32.inc and uncomment the line that says OPTION CASEMAP:NONE)
Let's look at the ReadString procedure comment in irvine32.asm:
; Reads a string from the keyboard and places the characters
; in a buffer.
; Receives: EDX offset of the input buffer
; ECX = maximum characters to input (including terminal null)
; Returns: EAX = size of the input string.
; Comments: Stops when Enter key (0Dh,0Ah) is pressed. If the user
; types more characters than (ECX-1), the excess characters
; are ignored.
ReadString takes an address of the buffer to hold the inputed string in edx, you are using the address of your prompt str1, maybe you meant to use buffer? You also did not put the size of the buffer into ecx
Your using edi as an index into your buffer, what value does edi contain? Your trying to put the value of eax into it, what does eax contain??? Both edi and eax probably contain garbage; not what you want.
Look at this carefully:
cvtDec:
mov eax, buffer[edi]
AND eax,0Fh
mov buffer[edi],edx
Your putting a value (That you think is an ASCII value of a number) into eax then converting to a decimal value... ok... Next, you are putting whatever is in edx back into your buffer. Is that what you want?

How do I ignore line breaks in input using NASM Assembly?

Learning NASM Assembly, I am trying to make a program that reads two one-digit number inputs.
I have two variables declared in the .bss:
num1 resb 1
num2 resb 1
Then, I ask the user to write the numbers like this:
; Get number 1
mov EAX,3
mov EBX,1
mov ECX,num1
mov EDX,1
int 0x80
; Get number 2
mov EAX,3
mov EBX,1
mov ECX,num2
mov EDX,1
int 0x80
Since I am only interested in one-digit number inputs, I set EDX to 1. This way, whatever the user types, only the first character will be stored in my variable (right?).
The problem is that everything that follows after that first character will be used for the future reads. If you type 5 and then press ENTER, the ASCII code 53 will be stored in num1 just fine, but the line break you generated by pressing ENTER will carry on to the next read instruction, which will be stored in num2. Clearly that's not what I was intending. I want the user to type a number, press ENTER, type another number, and press ENTER.
I am not entirely sure how to work around this in the simplest way possible.
The dumbest idea was to put a "dummy" read instruction between num1 and num2, which will capture the line break (and do nothing with it). This is obviously not good.
Here's a very basic way of reading input until you get digits you want. It will skip anything but digits. This approach is fine if it provides the functionality you want. If you need different behavior depending upon other non-numeric input, then you need to specify that behavior. Then that behavior can be programmed as well.
; Get number 1
mov ECX,num1
call GetNumber
; Get number 2
mov ECX,num2
call GetNumber
...
GetNumber:
pusha ; save regs
get:
mov EAX,3 ; system call for reading a character
mov EBX,0 ; 0 is standard input
mov EDX,1 ; number of characters to read
int 0x80 ; ECX has the buffer, passed into GetNumber
cmp byte [ecx],0x30
jl get ; Retry if the byte read is < '0'
cmp byte [ecx],0x39
jg get ; Retry if the byte read is > '9'
; At this point, if you want to just return an actual number,
; you could subtract '0' (0x30) off of the value read
popa ; restore regs
ret
Meddling with stdin to disable I_CANON will work, but may be the "hard way". Using a two byte buffer and doing mov edx, 2 will work if the pesky user is well behaved - either clear the second byte, or just ignore it.
Sometimes the pesky user is not well behaved. Dealing with "garbage input" or other error conditions generally takes much more code than just "doing the work"! Either deal with it, or be satisfied with a program that "usually" works. The second option may be sufficient for beginners.
The pesky user might just hit "enter" without entering a number. In this case, we want to either re-prompt, or perhaps print "Sorry you didn't like my program" and exit. Or he/she might type more than one character before hitting "enter". This is potentially dangerous! If a malicious user types "1rm -rf .", you've just wiped out your entire system! Unix is powerful, and like any powerful tool can be dangerous in the hands of an unskilled user.
You might try something like (warning: untested code ahead!)...
section .bss
num1 resb 1
num2 resb 1
trashbin resb 1
section .text
re_prompt:
; prompt for your number
; ...
; get the number (character representing the number!)
mov ecx, num1
reread:
mov edx, 1
mov ebx, 0 ; 1 will work, but 0 is stdin
mov eax, 3 ; sys_read
int 0x80
cmp byte [ecx], 10 ; linefeed
jz got_it
mov ecx, trashbin
jmp reread
got_it:
cmp byte [num1], 10 ; user entered nothing?
jz re_prompt ; or do something intelligent
; okay, we have a character in num1
; may want to make sure it's a valid digit
; convert character to number now?
; carry on
You may need to fiddle with that to make it work. I probably shouldn't post untested code (you can embarrass yourself that way!). "Something like that" might be easier for you than fiddling with termios. The second link Michael gave you includes the code I use for that. I'm not very happy with it (sloppy!), but it "kinda works". Either way, have fun! :)
You will have to deal with canonical disabling, raw keyboard.
This is how linux manages entering console password for exampe without showing it.
The assembly to do this is nicely described here:
http://asm.sourceforge.net/articles/rawkb.html

Why am I unable to print my number constant in NASM Assembly?

Learning NASM Assembly in 32-bit Ubuntu. I am somewhat confused:
In .bss, I reserve a byte for a variable:
num resb 1
Later I decided to give it a value of 5:
mov byte [num],5
And at some point print it out:
mov EAX,4
mov EBX,0
mov ECX,num
add ECX,'0' ; From decimal to ASCII
mov EDX,1
int 0x80
But it isn't printing anything.
I'm guessing that the problem is when I give num its value of 5. I originally wanted to do this:
mov byte num,5
As I thought that num refers to a position in memory, and so mov would copy 5 to such position. But I got an error saying
invalid combination of opcode and operands
So basically, why is the program not printing 5? And also, why was my suggestion above invalid?
To print using int 0x80 and code 4 you need ECX to be the address of the byte to print. You added '0' to the address of num that was in ECX before you called the print routine, so it was the address of something else out in memory somewhere.
You may want something like this. I created a separate area, numout to hold the ASCII version of num:
numout resb 1
....
mov EAX,4
mov EBX,0
mov CL,[num]
add CL,'0'
mov [numout],CL
mov ECX,numout
mov EDX,1
int 0x80

Problems with ATOI in x86 NASM Linux assembly

I don't understand how to convert a string to an integer.
This is for homework, but I do not want answers to the problem -- (AKA Correct code). I'd really appreciate it if someone could explain just what it is that i'm doing wrong! :(
Thanks in advance!!!
I'm running Ubuntu 12.04 on a virtual machine, 32 bit.
I compile with:
nasm -f elf proj2.asm
I link with:
gcc -o proj2 proj2.o
and then run it:
./proj2
It displays the first number, but then gives me a segmentation fault when I try to use atoi.
I have a teacher who wants us to:
read in numbers from a text file arranged as so:
4
5
4
2
9
(there is whitespace before each integer)
As per his instructions: "Be sure to read seven (7) characters into the buffer to get the entire line. These are the five characters representing the number together with characters CR and LF. CR is the Carriage Return character with hex code 0x0D and LF is the Line Feed character with hex code 0x0A.")
I've erased the spaces from the file, and tried to read it that way, but it didn't help.
The ints are to be read, into an array on the stack, with a maximum number of ints of 250. That's not the problem though :/
Below is my code so far.
BUFFERSIZE equ 10
section .data
file_name: db "/home/r/Documents/CS/project2/source/indata.txt", 0x00
file_mode: db "r", 0x00
output: db "%i",0xa
test: db "hello world",10
format: db "%u"
numToRead: db 1
temp: db "hi"
num:db "1",0,0
section .bss
fd: resd 4
length: resd 4
buffer resb BUFFERSIZE
;i was trying to use buffers and just
;read through each character in the string,
;but i couldn't get it to work
section .text
extern fopen
extern atoi
extern printf
extern fscanf
extern fgets
extern getc
extern fclose
global main
main:
;setting up stack frame
push ebp
mov ebp, esp
;opens file, store FD to eax
push file_mode
push file_name
call fopen
;save FD from eax into fd
push eax
mov ebx,eax
mov [fd],ebx
;ebx holds the file descriptor
;push in reverse order
push ebx
push numToRead
push temp
call fgets
push eax
call printf ;prints length (this works, i get a 4.
;Changing the number in the file changes the number displayed.
;I can also read in several lines, just can't get any ints!
;(So i can't do any comparisons or loops :/ )
;i shouldn't need to push eax here, right?
;It's already at the top of the stack from the printf
;pop eax
;push eax
call atoi
;calling atoi gives me a segmentation fault error
push eax
call printf
mov esp,ebp
pop ebp
ret
edit:
Interestingly, I can call atoi just fine. It's when i then try to
push eax
call atoi
push eax
call printf
that i get segmentation faults.
unless I cannot see it on my cellphone, but your not balancing the stack after your calls. those c functions are not stdcall so you have to adjust the stack after each call. I do:
add esp, 4 * numofpushes that might be the source of your seg faults.
edit: Interestingly, I can call atoi just fine. It's when i then try to
push eax
call atoi
push eax
call printf
that i get segmentation faults.
From the atoi reference: "On success, the function returns the converted integral number as an int value.".
Passing any random integer (like 4) as the first argument of the following printf (i.e. the format string pointer) is not likely to end well.

Resources