How does this x86 Assembly code create a string? - string

I'm studying the x86 assembly language. In order to better understand what's going on behind the scenes of string creation, I have a sample program that just prints a string. GCC produced the following Assembly program, and I'm having trouble understanding the compiler's output:
Assembly Code:
Dump of assembler code for function main:
0x0000000000400596 <+0>: push %rbp
0x0000000000400597 <+1>: mov %rsp,%rbp
0x000000000040059a <+4>: sub $0x10,%rsp
0x000000000040059e <+8>: movq $0x400668,-0x8(%rbp)
0x00000000004005a6 <+16>: mov -0x8(%rbp),%rax
0x00000000004005aa <+20>: mov %rax,%rsi
=> 0x00000000004005ad <+23>: mov $0x400675,%edi
0x00000000004005b2 <+28>: mov $0x0,%eax
0x00000000004005b7 <+33>: callq 0x4004a0 <printf#plt>
0x00000000004005bc <+38>: mov $0x0,%eax
0x00000000004005c1 <+43>: leaveq
0x00000000004005c2 <+44>: retq
C Code:
#include <stdio.h>
int main()
{
char *me = "abcdefghijkl";
printf("%s",me);
}
At the conceptual level, I understand that the stack pointer is being subtracted to allocate memory on the stack, and then somehow, and this is the part I'm having trouble understanding the mechanics of, the program creates the string.
Can someone please help?
Thanks.

It's a lot clearer if you use the -S flag to gcc to create an assembly file for your program (gcc -S asm.c). This generates a asm.s file:
.file "asm.c"
.section .rodata
.LC0:
.string "abcdefghijkl"
.LC1:
.string "%s"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq $.LC0, -8(%rbp)
movq -8(%rbp), %rax
movq %rax, %rsi
movl $.LC1, %edi
movl $0, %eax
call printf
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-36)"
.section .note.GNU-stack,"",#progbits
From this you can see that the string is just some initialized memory in the .rodata section assigned the label .LC0. Changing that memory changes the string.

Related

call printf system subroutine to output a integer number error in assembly code [duplicate]

This question already has answers here:
Why does Windows64 use a different calling convention from all other OSes on x86-64?
(4 answers)
How to write hello world in assembly under Windows?
(9 answers)
Closed last year.
Fro
run gcc s2.asm in windows7 console window; then a exe file is generated.
run a.exe,then crash, why.
s2.asm code is generated from source code following:
{
int m;
m = 1;
iprint (m) ;
}
s2.asm eplease refer to the following:
IO:
.string "%lld"
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
pushq $1
movq %rbp, %rax
leaq -8(%rax), %rax
popq (%rax)
movq %rbp, %rax
leaq -8(%rax), %rax
movq (%rax), %rax
pushq %rax
popq %rsi
leaq IO(%rip), %rdi
movq $0, %rax
callq printf
leaveq
retq
I installed tdm64-gcc-10.3.0-21.exe on my windows, therefore I have a gcc 64 bits.
But why a.exe crashed?
thank you.
hi all, thank your reply and ...
I am a fan of compiler technology, I want to realize my toy compiler by my self, it's very sample but support 64bits, which can be compiled by gcc on windows OS,and run on dindows console.
I meet a compiler which written by ocaml by Mune Professor on site ,
the generated assembly seems very sample.
ocaml64 is setup on my PC.
but only a ocaml compiler in it, there is no gcc.
then but by the toy compiler, 80x86 assembly code can be generated,
To convert assembly code to execute file, then Embarcadero_Dev-Cpp_6.3_TDM-GCC_9.2 is setup,
then gcc tmp.s, a a.exe file is generated,
but a.exe cannot be run successfully on windows.
the code is provided on site.1
But I have limited knowledge on assembly.
On this site, the assembly code emmiter module:
for linux, for cygwin, for old ocaml.
At last I have to reconsider the code again: I select cygwin emitter.
then generate assembly like the folowwing, I run the output a.exe file final succcesslly .
IO:
.string "%lld"
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
pushq $1
movq %rbp, %rax
leaq -8(%rax), %rax
popq (%rax)
movq %rbp, %rax
leaq -8(%rax), %rax
movq (%rax), %rax
pushq %rax
popq %rdx
leaq IO(%rip), %rcx
subq $32, %rsp
callq printf
addq $32, %rsp
leaveq
retq
$ ./a.exe
1
Note: the above assembly was not optimized.
After optimization, the code become the following:
IO:
.string "%lld"
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
pushq $1
movq %rbp, %rax
leaq -8(%rax), %rax
popq (%rax)
movq %rbp, %rax
leaq -8(%rax), %rax
movq (%rax), %rax
movq %rax, %rdx
leaq IO(%rip), %rcx
subq $32, %rsp
callq printf
addq $32, %rsp
leaveq
retq
I found the two rows
pushq %rax
popq %rdx
become one row.
movq rax, rdx
Here, the issue was resolved, it's caused by my mistake, since I am not really clear about the assembly code emitter module,
Thank all of you.
https://www.ed.tus.ac.jp/j-mune/ccp/

X86_64 Assembly code segfaults and gives stack smashing error

So, for this assignment I have to write an Assembly "function" to be called by C code. The purpose of the function is, given an integer and a memory address (the address of a char array, to be used as a string), convert the integer to a string, which starting address is the memory address that is given.
I'm on Ubuntu Linux, btw.
Here's the Assembly code (I tried to make it using the Linux x86_64 ABI calling conventions)(It is in AT&T syntax):
.global dec
.type dec, #function
.text
dec:
######################### Subroutine prologue
push %rbp # Save the base pointer
movq %rsp, %rbp # Make the stack pointer the new base pointer
push %rdi # Stack parameter 1
push %rsi # Stack parameter 2
push %rbx # Save callee-saved registers
push %r12
push %r13
push %r14
push %r15
######################### Subroutine body
movq %rdi, %rax
xor %rcx, %rcx
addDigit:
cmp $0, %rax
je putMem
xor %rdx, %rdx
mov $10, %ebx
div %ebx
addq $'0', %rdx
pushq %rdx
inc %rcx
jmp addDigit
putMem:
cmp $0, %rcx
je endProg
popq (%rsi)
add $1, %rsi
dec %rcx
jmp putMem
endProg:
movq $0x0, (%rsi)
movq -16(%rbp), %rsi
mov $1, %rax
######################### Subroutine epilogue
popq %r15 # Restore callee-saved registers
popq %r14
popq %r13
popq %r12
popq %rbx
movq %rbp, %rsp # Reset stack to base pointer.
popq %rbp # Restore the old base pointer
ret # Return to caller
And here is my C code:
extern int dec(int num, char* c);
#include <stdio.h>
int main(){
char* a = "Test\n";
dec(0x100, a);
printf("Num: %s\n", a);
}
It compiles without any problems, but when I try to run, it segfaults.
I've tried debugging it with gdb, and apparently the problem occurs when I try to run the instruction
pop (%rsi)
So, I made a few changes in my C code:
extern int dec(int num, char* c);
#include <stdio.h>
int main(){
char c;
dec(0x100, &c);
printf("Num: %s\n", &c);
}
Now, when I attempt to run it, I get this message:
Num: 256
*** stack smashing detected ***: ./teste.out terminated
Aborted (core dumped)
Can someone help me understand what's going on here and how do I fix my code?
Thanks in advance.

How do canary words allow gcc to detect buffer overflows?

I could test using strncpy() with larger source string then the destination:
int main() {
char *ptr = malloc(12);
strcpy(ptr,"hello world!");
return 0;
}
Compiling with the flag -fstack-protector and using the -S option I got:
.file "malloc.c"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movq $0, -16(%rbp)
movl $12, %edi
call malloc
movq %rax, -16(%rbp)
movq -16(%rbp), %rax
movabsq $8022916924116329800, %rdx
movq %rdx, (%rax)
movl $560229490, 8(%rax)
movb $0, 12(%rax)
movl $0, %eax
movq -8(%rbp), %rcx
xorq %fs:40, %rcx
je .L3
call __stack_chk_fail
.L3:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
Could someone explain to me how this works? And why isn't the "canary word" also overwritten by the \0 of the hello world! string?
Could someone explain to me how does this work ?
Canary word is read from fs:40 and store at top of frame here:
movq %fs:40, %rax
movq %rax, -8(%rbp)
It's below the return address so if your code happens to overflow the buffer (which will be below -8(%rbp)), it'll first overwrite the -8(%rbp) location. This will be detected by GCC prior to issuing ret here:
movq -8(%rbp), %rcx
xorq %fs:40, %rcx ; Checks that %fs:40 == -8(%rbp)
je .L3 ; Ok, return
call __stack_chk_fail ; Die
as overwritten contents of -8(%rbp) will likely to be different from proper value (installed from fs:40).
And why is not the canary word also overwritten by the \0 of the hello world!?
Your code has heap overflow, not buffer overflow so SSP can't help...

Use of general-purpose registers in Linux/x86

This is a general question (e.g. on Linux and x86):
Is it true that without calling syscall, a regular C program will not (implicitly) use any of general-purpose registers?
This program:
#include <stdio.h>
int main(int argc, char **argv) {
printf("%d", argc);
}
Produces this assembly:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 10
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
movl %edi, %ecx
leaq L_.str(%rip), %rdi
xorl %eax, %eax
movl %ecx, %esi
callq _printf
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "%d"
.subsections_via_symbols
Which clearly uses the general purpose registers eax, ecx, and esi. Additionally, note that there are no system calls in this code, just a function call to libc's _printf.

Printing floating point numbers in assembler

I'm trying to print a floating-point value from assemler calling a printf function. It works fine with strings and integer values but fails printing floats. Here is an example of working code:
global main
extern printf
section .data
message: db "String is: %d %x %s", 10, 0
end_message: db ".. end of string", 0
section .text
main:
mov eax, 0xff
mov edi, message
movsxd rsi, eax
mov rdx, 0xff
mov rcx, end_message
xor rax, rax
call printf
ret
String is: 255 ff .. end of string
So, the parameters are passed through registers: edi contains address of a formatting string, rsi and rdx contain the same number to print in decimal and hex styles, rcx contains end of a string, rax contains 0 as we do not have a float to print.
This code works fine but something changes while trying to print float:
global main
extern printf
section .data
val: dq 123.456
msg: db "Result is: %fl",10, 0
section .text
main:
mov rdi,msg
movsd xmm0,[val]
mov eax,1
call printf
mov rax, 0
ret
This code snipped can be compiled but returns segmentation fault being executed. It seems that the problem is in wrong value of xmm0 but trying to change movsd xmm0,[val] to movsd xmm0,val gives an
error: invalid combination of opcode and operands
message.
The compiler is NASM running on openSuSe 12.3
Update. I tried to make a c program and produce a .S assembly. It gives a very weird solution:
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movq val(%rip), %rax
movq %rax, -24(%rbp)
movsd -24(%rbp), %xmm0
movl $.LC0, %edi
movl $1, %eax
call printf
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
Is it possible to write a simple printf example?
for your assembler problem:
you need to align the stack before your main program starts.
insert
sub rsp, 8
right after main:
then add it again before ret:
add rsp, 8

Resources