Assembly: writing multiple lines into a buffer - linux

I have a problem with my second ever program in Assembly. The task is to read multiple lines of text from a keyboard and write them down into a buffer (.comm). After an empty line is entered, program should echo in a loop each previously typed line of text. A limit for one line of text is 100 charcters. However, I get a "program received signal sigsegv segmentation fault / 0x00000000006000a5 in check ()" error message.
My idea is to create a buffer of size 5050 bytes. Each line of text can have at most 100 characters. Here is a visual structure of the buffer:
[First line ][0][Second line ][0][Short ][0][Text ][0]
UPDATE: According to Jester's reply (thanks!), I've slightly modified my idea for the program. I abandoned the idea of 100 bytes per line. I'll simply place them one after another, simply separating them with a special character (0). So a new structure of the buffer would be:
[First line of text][0][No matter how long it is][0][short][0]
However, I've got a problem with appending the special "0" character to the BUFFER in "add_separator" part. I also wonder if it's really necessary since we add the "\n" new line indicator into the BUFFER asswell?
Also, the part when I check if the entered line of code is empty never returns true (empty line state) so the program keeps loading and loading new lines. Did I miss anything?
Here is an updated bit of code:
SYSEXIT = 1
SYSREAD = 3
SYSWRITE = 4
STDOUT = 1
STDIN = 0
EXIT_SUCCESS = 0
.align 32
.data #data section
.comm BUFFER, 5050 #my buffer of a size of 5050 bytes
BUFFER_len = 5050
.global _start
_start:
mov $0,%esi
read:
mov $SYSREAD, %eax
mov $STDIN, %ebx
mov BUFFER(%esi), %ecx
mov $1, %edx
int $0x80
check:
cmp $0, %eax # check if entered line is empty
je end # if yes, end program
lea BUFFER(%esi), %ecx # move the latest character for comparison
cmp '\n', %ecx # check if it's a line end
inc %esi # increment the iterator
je end
jmp read
end:
mov $SYSWRITE, %eax
mov $STDOUT, %ebx
mov $BUFFER, %ecx
mov $BUFFER_len, %edx
int $0x80
mov $SYSEXIT, %eax
mov $EXIT_SUCCESS, %ebx
int $0x80
Thanks in advance for any tips!
Filip

A few things:
Trust me, you don't want to use esp as a general purpose register as a beginner
The read system call will read at most as many bytes as you specify (in this case BUFFER_LEN) and it will return the number of bytes read. You should pass in 1 instead, so you can read char-by-char.
Adding 100 for the next word (you really mean line, right?) isn't terribly useful, just append each character continuously since that's how you want to print them too.
cmp '\n', %al would try to use the '\n' as an address, you want cmp $'\n', %al to use an immediate
Learn to use a debugger to find your own mistakes
using jg to jump over a jle is really unnecessary, just keep the jle and let the execution continue normally otherwise

Related

Loading value at address into register

As a learning exercise, I've been handwriting assembly. I can't seem to figure out how to load the value of an address into a register.
Semantically, I want to do the following:
_start:
# read(0, buffer, 1)
mov $3, %eax # System call 3 is read
mov $0, %ebx # File handle 0 is stdin
mov $buffer, %ecx # Buffer to write to
mov $1, %edx # Length of buffer
int $0x80 # Invoke system call
lea (%ecx, %ecx), %edi # Pull the value at address into %edi
cmp $97, %edi # Compare to 'a'
je done
I've written a higher-level implementation in C:
char buffer[1];
int main()
{
read(0, buffer, 1);
char a = buffer[0];
return (a == 'a') ? 1 : 0;
}
But compiling with gcc -S produces assembly that doesn't port well into my implementation above.
I think lea is the right instruction I should be using to load the value at the given address stored in %ecx into %edi, but upon inspection in gdb, %edi contains a garbage value after this instruction is executed. Is this approach correct?
Instead of the lea instruction, what you need is:
movzbl (%ecx), %edi
That is, zero extending into the edi register the byte at the memory address contained in ecx.
_start:
# read(0, buffer, 1)
mov $3, %eax # System call 3 is read
mov $0, %ebx # File handle 0 is stdin
mov $buffer, %ecx # Buffer to write to
mov $1, %edx # Length of buffer
int $0x80 # Invoke system call
movzbl (%ecx), %edi # Pull the value at address ecx into edi
cmp $97, %edi # Compare to 'a'
je done
Some advice
You don't really need the movz instruction: you don't need a separate load operation, since you can compare the byte in memory pointed by ecx directly with cmp:
cmpb $97, (%ecx)
You may want to specify the character to be compared against (i.e., 'a') as $'a' instead of $97 in order to improve readability:
cmpb $'a', (%ecx)
Avoiding conditional branches is usually a good idea. Immediately after performing the system call, you could use the following code that uses cmov for determining the return value, which is stored in eax, instead of performing a conditional jump (i.e., the je instruction):
xor %eax, %eax # set eax to zero
cmpb $'a', (%ecx) # compare to 'a'
cmovz %edx, %eax # conditionally move edx(=1) into eax
ret # eax is either 0 or 1 at this point
edx was set to 1 prior to the system call. Therefore, this approach above relies on the fact that edx is preserved across the system call (i.e., the int 0x80 instruction).
Even better, you could use sete on al after the comparison instead of the cmov:
xor %eax, %eax # set eax to zero
cmpb $'a', (%ecx) # compare to 'a'
sete %al # conditionally set al
ret # eax is either 0 or 1 at this point
The register al, which was set to zero by means of xor %eax, %eax, will be set to 1 if the ZF flag was set by the cmp (i.e., if the byte pointed by ecx is 'a'). With this approach you don't need to care about thinking whether the syscall preserves edx or not, since the outcome doesn't depend on edx.

Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf

I have written a Assembly program to display the factorial of a number following AT&T syntax. But it's not working. Here is my code
.text
.globl _start
_start:
movq $5,%rcx
movq $5,%rax
Repeat: #function to calculate factorial
decq %rcx
cmp $0,%rcx
je print
imul %rcx,%rax
cmp $1,%rcx
jne Repeat
# Now result of factorial stored in rax
print:
xorq %rsi, %rsi
# function to print integer result digit by digit by pushing in
#stack
loop:
movq $0, %rdx
movq $10, %rbx
divq %rbx
addq $48, %rdx
pushq %rdx
incq %rsi
cmpq $0, %rax
jz next
jmp loop
next:
cmpq $0, %rsi
jz bye
popq %rcx
decq %rsi
movq $4, %rax
movq $1, %rbx
movq $1, %rdx
int $0x80
addq $4, %rsp
jmp next
bye:
movq $1,%rax
movq $0, %rbx
int $0x80
.data
num : .byte 5
This program is printing nothing, I also used gdb to visualize it work fine until loop function but when it comes in next some random value start entering in various register. Help me to debug so that it could print factorial.
As #ped7g points out, you're doing several things wrong: using the int 0x80 32-bit ABI in 64-bit code, and passing character values instead of pointers to the write() system call.
Here's how to print an integer in x8-64 Linux, the simple and somewhat-efficient1 way, using the same repeated division / modulo by 10.
System calls are expensive (probably thousands of cycles for write(1, buf, 1)), and doing a syscall inside the loop steps on registers so it's inconvenient and clunky as well as inefficient. We should write the characters into a small buffer, in printing order (most-significant digit at the lowest address), and make a single write() system call on that.
But then we need a buffer. The maximum length of a 64-bit integer is only 20 decimal digits, so we can just use some stack space. In x86-64 Linux, we can use stack space below RSP (up to 128B) without "reserving" it by modifying RSP. This is called the red-zone. If you wanted to pass the buffer to another function instead of a syscall, you would have to reserve space with sub $24, %rsp or something.
Instead of hard-coding system-call numbers, using GAS makes it easy to use the constants defined in .h files. Note the mov $__NR_write, %eax near the end of the function. The x86-64 SystemV ABI passes system-call arguments in similar registers to the function-calling convention. (So it's totally different from the 32-bit int 0x80 ABI, which you shouldn't use in 64-bit code.)
// building with gcc foo.S will use CPP before GAS so we can use headers
#include <asm/unistd.h> // This is a standard Linux / glibc header file
// includes unistd_64.h or unistd_32.h depending on current mode
// Contains only #define constants (no C prototypes) so we can include it from asm without syntax errors.
.p2align 4
.globl print_integer #void print_uint64(uint64_t value)
print_uint64:
lea -1(%rsp), %rsi # We use the 128B red-zone as a buffer to hold the string
# a 64-bit integer is at most 20 digits long in base 10, so it fits.
movb $'\n', (%rsi) # store the trailing newline byte. (Right below the return address).
# If you need a null-terminated string, leave an extra byte of room and store '\n\0'. Or push $'\n'
mov $10, %ecx # same as mov $10, %rcx but 2 bytes shorter
# note that newline (\n) has ASCII code 10, so we could actually have stored the newline with movb %cl, (%rsi) to save code size.
mov %rdi, %rax # function arg arrives in RDI; we need it in RAX for div
.Ltoascii_digit: # do{
xor %edx, %edx
div %rcx # rax = rdx:rax / 10. rdx = remainder
# store digits in MSD-first printing order, working backwards from the end of the string
add $'0', %edx # integer to ASCII. %dl would work, too, since we know this is 0-9
dec %rsi
mov %dl, (%rsi) # *--p = (value%10) + '0';
test %rax, %rax
jnz .Ltoascii_digit # } while(value != 0)
# If we used a loop-counter to print a fixed number of digits, we would get leading zeros
# The do{}while() loop structure means the loop runs at least once, so we get "0\n" for input=0
# Then print the whole string with one system call
mov $__NR_write, %eax # call number from asm/unistd_64.h
mov $1, %edi # fd=1
# %rsi = start of the buffer
mov %rsp, %rdx
sub %rsi, %rdx # length = one_past_end - start
syscall # write(fd=1 /*rdi*/, buf /*rsi*/, length /*rdx*/); 64-bit ABI
# rax = return value (or -errno)
# rcx and r11 = garbage (destroyed by syscall/sysret)
# all other registers = unmodified (saved/restored by the kernel)
# we don't need to restore any registers, and we didn't modify RSP.
ret
To test this function, I put this in the same file to call it and exit:
.p2align 4
.globl _start
_start:
mov $10120123425329922, %rdi
# mov $0, %edi # Yes, it does work with input = 0
call print_uint64
xor %edi, %edi
mov $__NR_exit, %eax
syscall # sys_exit(0)
I built this into a static binary (with no libc):
$ gcc -Wall -static -nostdlib print-integer.S && ./a.out
10120123425329922
$ strace ./a.out > /dev/null
execve("./a.out", ["./a.out"], 0x7fffcb097340 /* 51 vars */) = 0
write(1, "10120123425329922\n", 18) = 18
exit(0) = ?
+++ exited with 0 +++
$ file ./a.out
./a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=69b865d1e535d5b174004ce08736e78fade37d84, not stripped
Footnote 1: See Why does GCC use multiplication by a strange number in implementing integer division? for avoiding div r64 for division by 10, because that's very slow (21 to 83 cycles on Intel Skylake). A multiplicative inverse would make this function actually efficient, not just "somewhat". (But of course there'd still be room for optimizations...)
Related: Linux x86-32 extended-precision loop that prints 9 decimal digits from each 32-bit "limb": see .toascii_digit: in my Extreme Fibonacci code-golf answer. It's optimized for code-size (even at the expense of speed), but well-commented.
It uses div like you do, because that's smaller than using a fast multiplicative inverse). It uses loop for the outer loop (over multiple integer for extended precision), again for code-size at the cost of speed.
It uses the 32-bit int 0x80 ABI, and prints into a buffer that was holding the "old" Fibonacci value, not the current.
Another way to get efficient asm is from a C compiler. For just the loop over digits, look at what gcc or clang produce for this C source (which is basically what the asm is doing). The Godbolt Compiler explorer makes it easy to try with different options and different compiler versions.
See gcc7.2 -O3 asm output which is nearly a drop-in replacement for the loop in print_uint64 (because I chose the args to go in the same registers):
void itoa_end(unsigned long val, char *p_end) {
const unsigned base = 10;
do {
*--p_end = (val % base) + '0';
val /= base;
} while(val);
// write(1, p_end, orig-current);
}
I tested performance on a Skylake i7-6700k by commenting out the syscall instruction and putting a repeat loop around the function call. The version with mul %rcx / shr $3, %rdx is about 5 times faster than the version with div %rcx for storing a long number-string (10120123425329922) into a buffer. The div version ran at 0.25 instructions per clock, while the mul version ran at 2.65 instructions per clock (although requiring many more instructions).
It might be worth unrolling by 2, and doing a divide by 100 and splitting up the remainder of that into 2 digits. That would give a lot better instruction-level parallelism, in case the simpler version bottlenecks on mul + shr latency. The chain of multiply/shift operations that brings val to zero would be half as long, with more work in each short independent dependency chain to handle a 0-99 remainder.
Related:
NASM version of this answer, for x86-64 or i386 Linux How do I print an integer in Assembly Level Programming without printf from the c library?
How to convert a binary integer number to a hex string? - Base 16 is a power of 2, conversion is much simpler and doesn't require div.
Several things:
0) I guess this is 64b linux environment, but you should have stated so (if it is not, some of my points will be invalid)
1) int 0x80 is 32b call, but you are using 64b registers, so you should use syscall (and different arguments)
2) int 0x80, eax=4 requires the ecx to contain address of memory, where the content is stored, while you give it the ASCII character in ecx = illegal memory access (the first call should return error, i.e. eax is negative value). Or using strace <your binary> should reveal the wrong arguments + error returned.
3) why addq $4, %rsp? Makes no sense to me, you are damaging rsp, so the next pop rcx will pop wrong value, and in the end you will run way "up" into the stack.
... maybe some more, I didn't debug it, this list is just by reading the source (so I may be even wrong about something, although that would be rare).
BTW your code is working. It just doesn't do what you expected. But work fine, precisely as the CPU is designed and precisely what you wrote in the code. Whether that does achieve what you wanted, or makes sense, that's different topic, but don't blame the HW or assembler.
... I can do a quick guess how the routine may be fixed (just partial hack-fix, still needs rewrite for syscall under 64b linux):
next:
cmpq $0, %rsi
jz bye
movq %rsp,%rcx ; make ecx to point to stack memory (with stored char)
; this will work if you are lucky enough that rsp fits into 32b
; if it is beyond 4GiB logical address, then you have bad luck (syscall needed)
decq %rsi
movq $4, %rax
movq $1, %rbx
movq $1, %rdx
int $0x80
addq $8, %rsp ; now rsp += 8; is needed, because there's no POP
jmp next
Again didn't try myself, just writing it from head, so let me know how it changed situation.

The assembly code (x86) with jumps and a syscall read function

I would like to ask anyone for help with understanding an assembly code. My problem is:
the code after the label L2 is important, it calls subroutine function. But it seems to me that the program would never get to the code after label L2, because according to me syscall read (after L1) always reads 0 and after compare it to 1. But zero never equals one, so it seems to me the program never jumps to L2. I guess I must be wrong. I would really appreciate any help
jmp L1
L2:
movzbl -0x11(%ebp), %eax
movsbl %al, %eax
mov %eax, (%esp)
call SUBROUTINE_FNC
<...>
L1:
mov $0x0, %ebx
lea -0x11(%ebp), %ecx
mov $0x1, %edx
mov $0x3, %eax
int $0x80
mov %eax, -0x10(%ebp)
cmpl $0x1, -0x10(%ebp)
je L2
The syscall corresponds to read and it looks like you are trying to read one byte at a time. read should return the number of actual bytes read, so if the call is successful then you will get a return value of 1, the compare will be true, and you will jump to L2, i.e.
L2:
SUBROUTINE_FNC(...);
if (read(fd, buff, 1) == 1) // read one byte
goto L2; // if one byte read then loop to L2
or, in a more structured form:
while (read(fd, buff, 1) == 1)
{
SUBROUTINE_FNC(...)
}

Finding escape characters in AT&T x86 assembly

Question 1: I have the following assembler code, whose purpose is to loop through an input string, and count the number of escape characters '%' it encounters:
.globl sprinter
.data
.escape_string: .string "%"
.num_escape: .long 0
.num_characters: .long 0
.text
sprinter:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%ecx # %ecx = parameter 1
loop:
cmpb $0, (%ecx) # if end of string reached
jz exit
cmpl $.escape_string,(%ecx) # if escape character found
je increment
back:
incl .num_characters
incl %ecx
jmp loop
increment:
incl .num_escape
jmp back # jump to 'back'
exit:
movl .num_escape, %eax # return num_escape
popl %ebp
ret
This assembly code is compiled together with the following C code:
#include <stdio.h>
extern int sprinter (char* string);
int main (void)
{
int n = sprinter("a %d string of some %s fashion!");
printf("value: %d",n);
return 0;
}
The expected output from running this code is value: 2 (because there are two '%' characters in the string), but it returns value: 0, meaning the following line fails (because it never increments the counter):
cmpl $.escape_string,(%ecx) # if escape character found
Am I using the wrong method of comparing for the string? The outer loop works fine, and .num_characters correctly contains the number of characters in my string. I generated some assembly code for a simple C-program that compared a string "hello" to "hello2", and this is the relevant code:
.LC0:
.string "hello"
.LC1:
.string "hello2"
...
movl $.LC0, -4(%ebp)
cmpl $.LC1, -4(%ebp)
It looks very similar to what I tried, no?
Question 2. This code is part of what is going to be a simplified sprintf-function written in assembly. This means the first parameter should be the result string, and the second parameter is the formatting. How do I copy a byte character from our current position in one register to our current position in another register? Let's assume we've assigned our parameters into two registers:
movl 8(%ebp),%edx # %edx = result-string
movl 12(%ebp),%ecx # %ecx = format-string
I tried the following in the loop:
movb (%ecx), %al
movb %al, (%edx) # copy current character to current position in result register
incl %ecx
incl %edx
But the result string just contains a (the first character in my string), and not the full string as I expected.
All help appreciated because this comparison problem (question 1) is currently keeping me stuck.
In regards to question 1, it appears that you are comparing single byte chars so 'cmpl' should be 'cmpb' when checking for the escape character. You will also need to load your character into a register. I'm not really familiar with AT&T assembly, so I hope this is correct.
Before loop:
movb .escape_string, %al
Comparison:
cmpb %al, %(ecx)

IA 32 read command line argument

I am trying to read command line arguments with assembly code for IA 32. I found an explanation of how to do it here http://www.paladingrp.com/ia32.shtml. I am able to use the stack pointer to get the number of arguments but I am not able to get the value of the arguments.
Here is what I am trying to do:
movl 8(%esp), %edx # Move pointer to argument 1 to edx
movl (%edx), %ebx # Move value of edx to ebx
movl $1, %eax # opcode for exit system call in eax
int $0x80 # return
Am I getting the correct pointer? If so, how do I get the value of it? If not, how do I get the correct pointer?
movl (%edx), %ebx # Move value of edx to ebx
That doesn't move value of EDX to EBX (the comment is incorrect).
That dereferences pointer in EDX, and puts the result of dereference into EBX. So if you invoked your program with ./a.out foo, then EBX will end up being 0x006f6f66 (== '\0oof' ("foo\0" in little-endian)).
I am guessing that's not what you wanted, but your question is not very clear about what you are expecting to happen where.

Resources