Finding escape characters in AT&T x86 assembly - string

Question 1: I have the following assembler code, whose purpose is to loop through an input string, and count the number of escape characters '%' it encounters:
.globl sprinter
.data
.escape_string: .string "%"
.num_escape: .long 0
.num_characters: .long 0
.text
sprinter:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%ecx # %ecx = parameter 1
loop:
cmpb $0, (%ecx) # if end of string reached
jz exit
cmpl $.escape_string,(%ecx) # if escape character found
je increment
back:
incl .num_characters
incl %ecx
jmp loop
increment:
incl .num_escape
jmp back # jump to 'back'
exit:
movl .num_escape, %eax # return num_escape
popl %ebp
ret
This assembly code is compiled together with the following C code:
#include <stdio.h>
extern int sprinter (char* string);
int main (void)
{
int n = sprinter("a %d string of some %s fashion!");
printf("value: %d",n);
return 0;
}
The expected output from running this code is value: 2 (because there are two '%' characters in the string), but it returns value: 0, meaning the following line fails (because it never increments the counter):
cmpl $.escape_string,(%ecx) # if escape character found
Am I using the wrong method of comparing for the string? The outer loop works fine, and .num_characters correctly contains the number of characters in my string. I generated some assembly code for a simple C-program that compared a string "hello" to "hello2", and this is the relevant code:
.LC0:
.string "hello"
.LC1:
.string "hello2"
...
movl $.LC0, -4(%ebp)
cmpl $.LC1, -4(%ebp)
It looks very similar to what I tried, no?
Question 2. This code is part of what is going to be a simplified sprintf-function written in assembly. This means the first parameter should be the result string, and the second parameter is the formatting. How do I copy a byte character from our current position in one register to our current position in another register? Let's assume we've assigned our parameters into two registers:
movl 8(%ebp),%edx # %edx = result-string
movl 12(%ebp),%ecx # %ecx = format-string
I tried the following in the loop:
movb (%ecx), %al
movb %al, (%edx) # copy current character to current position in result register
incl %ecx
incl %edx
But the result string just contains a (the first character in my string), and not the full string as I expected.
All help appreciated because this comparison problem (question 1) is currently keeping me stuck.

In regards to question 1, it appears that you are comparing single byte chars so 'cmpl' should be 'cmpb' when checking for the escape character. You will also need to load your character into a register. I'm not really familiar with AT&T assembly, so I hope this is correct.
Before loop:
movb .escape_string, %al
Comparison:
cmpb %al, %(ecx)

Related

Problems with writing an x86 Assembly program that reverses two words separated by a space

I'm trying to write a program that takes a string input of two words, then reverses the order of the words and prints them. Right now I'm struggling with a few problems because the teacher didn't teach us how to write x86 assembly very well.
Currently I'm trying to figure out how to:
Find the start of the array again, as I believe the current program messes with the stack, and printing ♣ at 8(%esp,%ebx) when %ebx = 0
How to store %ebx when the index hits the space, preferably into the register of %ecx, though I do not know how to prevent _printf from messing with %ecx in the current set up
Lastly, whenever I input two words where the first word is smaller than 3 letters it doesn't output the 2nd word.
Right now, this is the output of my program (when the input is "Hello World")
Enter a string:
Hello World (input)
Your string is:
Hello World
Index of Space is = 6
World♣ <--- Supposed to be World Hello, in this case just testing to make sure I can find the 1st element of %eax
The registers I am currently using are %eax for the string array, and %ebx for the index of %eax. I would like to use %ecx, or any other register to store %ebx value when %ebx hits the space character.
Here is my x86 Assembly code so far:
LC0:
.ascii "Enter a string: \0"
LC1:
.ascii "%[^\n]\0"
LC2:
.ascii "Your string is:\12\0"
LC3:
.ascii "\12Index of Space is = %d\12\0"
LC4:
.ascii "%c\0" # print out the character
LC6:
.ascii "Yeehaw!\0" # test message
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $112, %esp
movl $LC0, (%esp)
call _puts # similar to _printf function
leal 8(%esp), %eax
movl %eax, 4(%esp)
movl $LC1, (%esp)
call _scanf # get string
movl $LC2, (%esp)
call _printf # printout the output text of LC2
###### A[i] ###### ebx is the index
movl $0, %ebx #index of array i=0
movl $0, %ecx # Where I want to store space index
jmp .L2
.L1: #TEXT PORTION
movsbl %al, %eax # eax =FFFFFFXX XX means for each character print out the input character
incl %ebx # next index of array
movl %eax, 4(%esp) # print out from the first element of the array
movl $LC4, (%esp)
call _printf
.L2: #TEXT PORTION
movzbl 8(%esp,%ebx), %eax # the value of esp+8+0 strating address of the array to eax
cmpb $0x20, %al # compare eax to see if it is a space
jne .EndOfLine
movl %ebx, 8(%esp) # Save space index
.EndOfLine:
testb %al, %al # if eax is not equal to zero jump to L3, if is zero menas end of the string
jne .L1
movl 8(%esp), %ebx # Take space index back
incl %ebx
#States space index
movl %ebx, 4(%esp) # print out from the first element of the array
movl $LC3, (%esp)
call _printf
jmp .T2
.T1: #TEXT PORTION
movsbl %al, %eax # eax =FFFFFFXX XX means for each character print out the input character
incl %ebx # next index of array
movl %eax, 4(%esp) # print out from the first element of the array
movl $LC4, (%esp)
call _printf
.T2: #TEXT PORTION
movzbl 8(%esp,%ebx), %eax # the value of esp+8+0 strating address of the array to eax
testb %al, %al # if eax is not equal to zero jump to L3, if is zero menas end of the string
jne .T1
.Test:
#Test to find initial array value, currently prints ♣ at 8(%esp,$0)
movl $0, %ebx
movzbl 8(%esp,%ebx), %eax
movsbl %al, %eax
movl %eax, 4(%esp)
movl $LC4, (%esp)
call _printf
.Done:
movl $0, %eax
addl $112, %esp
leave
ret
Any help would be appreciated, and if explain why I need to do certain things since my teacher wasn't the best at explaining topics like the stack and such.
UPDATE: 8(%esp,%ebx) when ebx is 0 properly locates the initial letter, and using $116(%esp) as a local variable to store %ebx.

Convert a string of digits to an integer by using a subroutine

Assembly language program to read in a (three-or-more-digit) positive integer as a string and convert the string to the actual value of the integer.
Specifically, create a subroutine to read in a number. Treat this as a string, though it will be composed of digits. Also, create a subroutine to convert a string of digits to an integer.
Do not have to test for input where someone thought i8xc was an integer.
I am doing it like this. Please help.
.section .data
String:
.asciz "1234"
Intg:
.long 0
.section .text
.global _start
_start:
movl $1, %edi
movl $String, %ecx
character_push_loop:
cmpb $0, (%ecx)
je conversion_loop
movzx (%ecx), %eax # move byte from (%ecx) to eax
pushl %eax # Push the byte on the stack
incl %ecx # move to next byte
jmp character_push_loop # loop back
conversion_loop:
popl %eax # pop off a character from the stack
subl $48, %eax # convert to integer
imul %edi, %eax # eax = eax*edi
addl %eax, Intg
imul $10, %edi
decl %ecx
cmpl $String, %ecx # check when it get's to the front %ecx == $String
je end # When done jump to end
jmp conversion_loop
end:
pushl Intg
addl $8, %esp # clean up the stack
movl $0, %eax # return zero from program
ret
Also, I am unable to get the output. I am getting a Segmentation Fault. I am not able to find out what is the error in my code.
Proper interaction with operating system is missing.
In the end: you pushed the result but the following addl $8, %esp invalidates the pushed value and the final ret incorrectly leads the instruction flow to whatever garbage was in the memory pointed by SS:ESP+4 at the program entry.
When you increase the stack pointer, you cannot rely that data below ESP will survive.
Your program does not interact with its user, if you want it to print something, use system function to write.
print_String:
mov $4,eax ; System function "sys_write".
mov $1,ebx ; Handle of the standard output (console).
mov $String,ecx ; Pointer to the text string.
mov $4,edx ; Number of bytes to print.
int 0x80 ; Invoke kernel function.
end:mov $1,eax ; System function "sys_exit".
mov (Intg),ebx ; Let your program terminate gracefully with errorlevel Intg.
int 0x80 ; Invoke kernel function.

(AT&T Debian 32-bit OS) How to reverse a list of values into another variable in Assembly

So, the goal of this (school) assignment is to take an input file of 16 words of no longer than 16 characters in size (including a \n), reverse their order, and output them to a new file. The contents of the input file are below.
input.txt:
1. First.......
2. Second......
3. Third.......
4. Fourth......
5. Fifth.......
6. Sixth.......
7. Seventh.....
8. Eighth......
9. Ninth.......
10. Tenth......
11. Eleventh...
12. Twelfth....
13. Thirteenth.
14. Fourteenth.
15. Fifteenth..
16. Sixteenth..
So far, I've managed to take in the input file and assign it to a variable. I've verified this with temporary print statements (commented in my code below). But when I got to reverse this into an Output variable (before writing said variable to a new file), it doesn't work.
My current code below and the result I get after running it is below. I've been working at this for a good couple hours now and it is pretty frustrating how little info is out there about this particular syntax of Assembly, so any clear cut help would be appreciated. Take note -- I am a novice here so I may not understand assembly terms or concepts that might come across as basic. Bare with me please!
My code:
.section .data
InFileName:
.asciz "input.txt"
InFileHandle:
.int 0
OutFileName:
.asciz "output.txt"
OutFileHandle:
.int 0
Output:
.fill 256
.section .bss
Input:
.fill 256
.section .text
.globl _start
_start:
movl $5, %eax # Open file
movl $InFileName, %ebx
movl $02, %ecx
movl $0777, %edx
int $0x80
test %eax, %eax # Check for file error
js Error
movl %eax, InFileHandle # Puts Input file handle into variable
movl $3, %eax # Writes file to variable
movl InFileHandle, %ebx
movl $Input, %ecx
movl $InputLen, %edx
int $0x80
# call PrintVariable # Temporary call for testing
call Reverse
PrintVariable:
movl $4, %eax
movl $1, %ebx
int $0x80
ret
Error:
call ExitProg
Reverse:
movl $Input, %esi # Move %esi to Input
movl $(Output+240), %edi # Move %edi to end of Output
cld # Clear direction flag
movl $16, %ecx # Move input length to %ecx
subl $32, %edi # Subtract twice input length from %edi
rep movsb # Repeat until %ecx is 0
movl $Output, %ecx # Assign output for printing
movl $256, %edx
call PrintVariable # Printing value for testing purposes
ExitProg:
movl $1, %eax
movl $0, %ebx
int $0x80
My output after running this code:
1. First.......
Not sure where I'm going wrong here, so any help would be much appreciated. After I solve this problem (reversing the inputted data into a new variable), I need to push that data into an output file. I'm sure I can figure that part out myself, but I won't say no to help there either.

Comparing Inputted Word to Array in x86 GNU GAS Assembly

This is actually my second question about this particular problem today, but the other one was answered pretty quickly.
In essence, I am trying to take in a string of letters (no numbers or symbols) and then compare each inputted letter to an array of .asciz values that represent the NATO Military Phonetic Alphabet (Alpha, Bravo, Charlie, etc.) and output the representative Military equivalent to the letter.
This is where I am stuck. I'm fairly new to Assembly and this is a homework assignment, so help is much needed and appreciated. My professor is not great at offering resources to learn this stuff and it's difficult to find good resources for exact problems online.
Any help would be much appreciated. Specifically on how to compare each letter input to the array. I've already successfully stored the input in a variable.
Below is a C# representation of what I am attempting to do.
class MilAlpha
{
static void Main(string[] args)
{
string input;
string[] miliAlpha = { "Alpha", "Beta", "Charlie", "Delta", "Echo", "Foxtrot", "Golf",
"Hotel", "India", "Juliet", "Kilo", "Lima", "Mike", "November",
"Oscar", "Papa", "Quebec", "Romeo", "Sierra", "Tango", "Uniform",
"Victor", "Whiskey", "X-Ray", "Yankee", "Zulu" };
Console.WriteLine("Enter a string of text: ");
input = Console.ReadLine();
for (int i = 0; i < input.Length; i++) {
for (int j = 0; j < miliAlpha.Length; j++) {
if (input[i] == ' ')
Console.WriteLine("\n")
string temp = miliAlpha[j].ToLower();
if (input[i] == temp[0])
Console.WriteLine("\n" + miliAlpha[j] + "\n");
}
}
Console.ReadKey();
}
}
EDIT:
So I believe this should do what I am trying to do, but it doesn't seem to work as intended. It compares the correct things in the debugger, but when it goes to print the respective portion of the array, it simply doesn't print anything.
.section .data
MAlpha:
.asciz "Alpha \n"
.equ ElementLen, .-MAlpha
.asciz "Bravo \n"
.asciz "Charlie \n"
.asciz "Delta \n"
.asciz "Echo \n"
.asciz "Foxtrot \n"
.asciz "Golf \n"
.asciz "Hotel \n"
.asciz "India \n"
.asciz "Juliet \n"
.asciz "Kilo \n"
.asciz "Lima \n"
.asciz "Mike \n"
.asciz "November \n"
.asciz "Oscar \n"
.asciz "Papa \n"
.asciz "Quebec \n"
.asciz "Romeo \n"
.asciz "Sierra \n"
.asciz "Tango \n"
.asciz "Uniform \n"
.asciz "Victor \n"
.asciz "Whiskey \n"
.asciz "X-Ray \n"
.asciz "Yankee \n"
.asciz "Zulu \n"
.asciz " \n"
.equ MAlphaLen, .-MAlpha
Input:
.fill 80
.equ InputLen, .-Input
InputMSG:
.ascii "Please enter a word: "
.equ InputMSGLen, .-InputMSG
BlankLine:
.ascii "\n"
.equ BlankLineLen, .-BlankLine
Converting:
.ascii "\nConverting to NATO Alphabet...\n\n"
.equ ConvertingLen, .-Converting
.section .bss
.section .text
.globl _start
GetInput:
movl $4, %eax
movl $1, %ebx
movl $InputMSG, %ecx
movl $InputMSGLen, %edx
int $0x80
movl $3, %eax
movl $0, %ebx
movl $Input, %ecx
movl $InputLen, %edx
int $0x80
ret
PrintInput:
movl $4, %eax
movl $1, %ebx
movl $BlankLine, %ecx
movl $BlankLineLen, %edx
int $0x80
movl $4, %eax
movl $1, %ebx
movl $Input, %ecx
movl $InputLen, %edx
int $0x80
movl $4, %eax
movl $1, %ebx
movl $Converting, %ecx
movl $ConvertingLen, %edx
int $0x80
ret
Convert:
# Get first letter of input string
# Compare letter to first letter of each array entry
# When match is found, print Array entry to screen
# Repeat until end of input string
movl $Input, %eax
movl $MAlpha, %edi
call Loop
ret
Loop:
movb (%eax), %al
cmp $0x0A, %al
je Finished
call CompareAlpha
jmp Loop
CompareAlpha:
movb (%edi), %bl
cmpb %bl, %al
je PrintWord
addl $ElementLen, %edi
jmp CompareAlpha
PrintWord:
movl $4, %eax
movl $1, %ebx
movl (%edi), %eax
movl $ElementLen, %edx
int $0x80
Finished:
call ExitProg
_start:
call GetInput
call PrintInput
call Convert
call ExitProg
PrintMAlpha:
movl $4, %eax
movl $1, %ebx
movl $MAlpha, %ecx
movl $MAlphaLen, %edx
int $0x80
ExitProg:
movl $1, %eax
movl $0, %ebx
int $0x80
Here's a few bugs to get you started:
In Loop, you are keeping the pointer to the input string in %eax, but you load the character into %al which is the low byte of %eax, thus trashing its value. Pick another register for one of them.
You never increment your pointer in Loop, so it will loop forever (if it doesn't crash first due to one of your other bugs).
CompareAlpha doesn't reset %edi on successive calls. So if the first character is 'H', %edi will be left pointing to "Hotel" after the call. If the next character is E, CompareAlpha will search forward for it starting from "Hotel". Of course it won't find it, so it runs off the end of the array and crashes.
PrintWord loads four bytes of the string into %eax (overwriting the system call number), whereas it should be loading the address of the string into %ecx. Replace movl (%edi), %eax by movl %edi, %ecx (note it is now a register-to-register move and not a load from memory).
PrintWord clobbers registers %eax, %ebx, %ecx, %edx, some of which its caller is expecting to remain unchanged. Either push and pop those registers, or rewrite CompareAlpha to do so before calling it.
PrintWord is missing a ret at the end, so it falls through into ExitProg.
After fixing these, I was able to successfully convert the string "HELLO".
These were all findable by single-stepping the code in the gdb debugger (si command) and watching the contents of registers (display $eax) and what they point to (display/s $edi, etc). I suggest practicing this.
Note that a more efficient design, instead of linear search through the array of codewords, would be to simply index into it. Take your character and subtract the ASCII code of 'A' (0x41), multiply by $ElementLen, and add to $MAlpha. Now you have a pointer to the desired codeword without looping. If you use an auxiliary array of pointers as in your other post, this is even easier as each pointer has length 4, so you can use the SIB addressing mode and do movl MAlpha(,%eax,4), %edi; make sure the high 24 bits of %eax are zeroed. This also avoids the need for padding all the code words with spaces (though then you'll need to write your own strlen to compute the length, or have a separate array of lengths, or write out one byte at a time until you see the 0 at the end).
Also, as a general tip, it would be wise to document each of your subroutines: what exactly does it do, in which registers does it expect its inputs and leave its outputs, and which registers does it clobber? You may want to try to have some commonality between these, perhaps even creating your own standard calling conventions.

Assembly: writing multiple lines into a buffer

I have a problem with my second ever program in Assembly. The task is to read multiple lines of text from a keyboard and write them down into a buffer (.comm). After an empty line is entered, program should echo in a loop each previously typed line of text. A limit for one line of text is 100 charcters. However, I get a "program received signal sigsegv segmentation fault / 0x00000000006000a5 in check ()" error message.
My idea is to create a buffer of size 5050 bytes. Each line of text can have at most 100 characters. Here is a visual structure of the buffer:
[First line ][0][Second line ][0][Short ][0][Text ][0]
UPDATE: According to Jester's reply (thanks!), I've slightly modified my idea for the program. I abandoned the idea of 100 bytes per line. I'll simply place them one after another, simply separating them with a special character (0). So a new structure of the buffer would be:
[First line of text][0][No matter how long it is][0][short][0]
However, I've got a problem with appending the special "0" character to the BUFFER in "add_separator" part. I also wonder if it's really necessary since we add the "\n" new line indicator into the BUFFER asswell?
Also, the part when I check if the entered line of code is empty never returns true (empty line state) so the program keeps loading and loading new lines. Did I miss anything?
Here is an updated bit of code:
SYSEXIT = 1
SYSREAD = 3
SYSWRITE = 4
STDOUT = 1
STDIN = 0
EXIT_SUCCESS = 0
.align 32
.data #data section
.comm BUFFER, 5050 #my buffer of a size of 5050 bytes
BUFFER_len = 5050
.global _start
_start:
mov $0,%esi
read:
mov $SYSREAD, %eax
mov $STDIN, %ebx
mov BUFFER(%esi), %ecx
mov $1, %edx
int $0x80
check:
cmp $0, %eax # check if entered line is empty
je end # if yes, end program
lea BUFFER(%esi), %ecx # move the latest character for comparison
cmp '\n', %ecx # check if it's a line end
inc %esi # increment the iterator
je end
jmp read
end:
mov $SYSWRITE, %eax
mov $STDOUT, %ebx
mov $BUFFER, %ecx
mov $BUFFER_len, %edx
int $0x80
mov $SYSEXIT, %eax
mov $EXIT_SUCCESS, %ebx
int $0x80
Thanks in advance for any tips!
Filip
A few things:
Trust me, you don't want to use esp as a general purpose register as a beginner
The read system call will read at most as many bytes as you specify (in this case BUFFER_LEN) and it will return the number of bytes read. You should pass in 1 instead, so you can read char-by-char.
Adding 100 for the next word (you really mean line, right?) isn't terribly useful, just append each character continuously since that's how you want to print them too.
cmp '\n', %al would try to use the '\n' as an address, you want cmp $'\n', %al to use an immediate
Learn to use a debugger to find your own mistakes
using jg to jump over a jle is really unnecessary, just keep the jle and let the execution continue normally otherwise

Resources