I am trying to learn nasm. I want to make a program that prints "Hello, world." n times (in this case 10). I am trying to save the loop register value in a constant so that it is not changed when the body of the loop is executed. When I try to do this I receive a segmentation fault error. I am not sure why this is happening.
My code:
SECTION .DATA
print_str: db 'Hello, world.', 10
print_str_len: equ $-print_str
limit: equ 10
step: dw 1
SECTION .TEXT
GLOBAL _start
_start:
mov eax, 4 ; 'write' system call = 4
mov ebx, 1 ; file descriptor 1 = STDOUT
mov ecx, print_str ; string to write
mov edx, print_str_len ; length of string to write
int 80h ; call the kernel
mov eax, [step] ; moves the step value to eax
inc eax ; Increment
mov [step], eax ; moves the eax value to step
cmp eax, limit ; Compare sil to the limit
jle _start ; Loop while less or equal
exit:
mov eax, 1 ; 'exit' system call
mov ebx, 0 ; exit with error code 0
int 80h ; call the kernel
The result:
Hello, world.
Segmentation fault (core dumped)
The cmd:
nasm -f elf64 file.asm -o file.o
ld file.o -o file
./file
section .DATA is the direct cause of the crash. Lower-case section .data is special, and linked as a read-write (private) mapping of the executable. Section names are case-sensitive.
Upper-case .DATA is not special for nasm or the linker, and it ends up as part of the text segment mapped read+exec without write permission.
Upper-case .TEXT is also weird: by default objdump -drwC -Mintel only disassembles the .text section (to avoid disassembling data as if it were code), so it shows empty output for your executable.
On newer systems, the default for a section name NASM doesn't recognize doesn't include exec permission, so code in .TEXT will segfault. Same as Assembly section .code and .text behave differently
After starting the program under GDB (gdb ./foo, starti), I looked at the process's memory map from another shell.
$ cat /proc/11343/maps
00400000-00401000 r-xp 00000000 00:31 110651257 /tmp/foo
7ffff7ffa000-7ffff7ffd000 r--p 00000000 00:00 0 [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso]
7ffffffde000-7ffffffff000 rwxp 00000000 00:00 0 [stack]
As you can see, other than the special VDSO mappings and the stack, there's only the one file-backed mapping, and it has read+exec permission only.
Single-stepping inside GDB, the mov eax,DWORD PTR ds:0x400086 load succeeds, but the mov DWORD PTR ds:0x400086,eax store faults. (See the bottom of the x86 tag wiki for GDB asm tips.)
From readelf -a foo, we can see the ELF program headers that tell the OS's program loader how to map it into memory:
$ readelf -a foo # broken version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000bf 0x00000000000000bf R 0x200000
Section to Segment mapping:
Segment Sections...
00 .DATA .TEXT
Notice how both .DATA and .TEXT are in the same segment. This is what you'd want for section .rodata (a standard section name where you should put read-only constant data like your string), but it won't work for mutable global variables.
After fixing your asm to use section .data and .text, readelf shows us:
$ readelf -a foo # fixed version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000e7 0x00000000000000e7 R E 0x200000
LOAD 0x00000000000000e8 0x00000000006000e8 0x00000000006000e8
0x0000000000000010 0x0000000000000010 RW 0x200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
Notice how segment 00 is R + E without W, and the .text section is in there. Segment 01 is RW (read + write) without exec, and the .data section is there.
The LOAD tag means they're mapped into the process's virtual address space. Some section (like debug info) aren't, and are just metadata for other tools. But NASM flags unknown section names as progbits, i.e. loaded, which is why it was able to link and have the load not segfault.
After fixing it to use section .data, your program runs without segfaulting.
The loop runs for one iteration, because the 2 bytes following step: dw 1 are not zero. After the dword load, RAX = 0x2c0001 on my system. (cmp between 0x002c0002 and 0xa makes the LE condition false because it's not less or equal.)
dw means "data word" or "define word". Use dd for a data dword.
BTW, there's no need to keep your loop counter in memory. You're not using RDI, RSI, RBP, or R8..R15 for anything so you could just keep it in a register. Like mov edi, limit before the loop, and dec edi / jnz at the bottom.
But actually you should use the 64-bit syscall ABI if you want to build 64-bit code, not the 32-bit int 0x80 ABI. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?. Or build 32-bit executables if you're following a guide or tutorial written for that.
Anyway, in that case you'd be able to use ebx as your loop counter, because the syscall ABI uses different args for registers.
Related
I am trying to learn nasm. I want to make a program that prints "Hello, world." n times (in this case 10). I am trying to save the loop register value in a constant so that it is not changed when the body of the loop is executed. When I try to do this I receive a segmentation fault error. I am not sure why this is happening.
My code:
SECTION .DATA
print_str: db 'Hello, world.', 10
print_str_len: equ $-print_str
limit: equ 10
step: dw 1
SECTION .TEXT
GLOBAL _start
_start:
mov eax, 4 ; 'write' system call = 4
mov ebx, 1 ; file descriptor 1 = STDOUT
mov ecx, print_str ; string to write
mov edx, print_str_len ; length of string to write
int 80h ; call the kernel
mov eax, [step] ; moves the step value to eax
inc eax ; Increment
mov [step], eax ; moves the eax value to step
cmp eax, limit ; Compare sil to the limit
jle _start ; Loop while less or equal
exit:
mov eax, 1 ; 'exit' system call
mov ebx, 0 ; exit with error code 0
int 80h ; call the kernel
The result:
Hello, world.
Segmentation fault (core dumped)
The cmd:
nasm -f elf64 file.asm -o file.o
ld file.o -o file
./file
section .DATA is the direct cause of the crash. Lower-case section .data is special, and linked as a read-write (private) mapping of the executable. Section names are case-sensitive.
Upper-case .DATA is not special for nasm or the linker, and it ends up as part of the text segment mapped read+exec without write permission.
Upper-case .TEXT is also weird: by default objdump -drwC -Mintel only disassembles the .text section (to avoid disassembling data as if it were code), so it shows empty output for your executable.
On newer systems, the default for a section name NASM doesn't recognize doesn't include exec permission, so code in .TEXT will segfault. Same as Assembly section .code and .text behave differently
After starting the program under GDB (gdb ./foo, starti), I looked at the process's memory map from another shell.
$ cat /proc/11343/maps
00400000-00401000 r-xp 00000000 00:31 110651257 /tmp/foo
7ffff7ffa000-7ffff7ffd000 r--p 00000000 00:00 0 [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso]
7ffffffde000-7ffffffff000 rwxp 00000000 00:00 0 [stack]
As you can see, other than the special VDSO mappings and the stack, there's only the one file-backed mapping, and it has read+exec permission only.
Single-stepping inside GDB, the mov eax,DWORD PTR ds:0x400086 load succeeds, but the mov DWORD PTR ds:0x400086,eax store faults. (See the bottom of the x86 tag wiki for GDB asm tips.)
From readelf -a foo, we can see the ELF program headers that tell the OS's program loader how to map it into memory:
$ readelf -a foo # broken version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000bf 0x00000000000000bf R 0x200000
Section to Segment mapping:
Segment Sections...
00 .DATA .TEXT
Notice how both .DATA and .TEXT are in the same segment. This is what you'd want for section .rodata (a standard section name where you should put read-only constant data like your string), but it won't work for mutable global variables.
After fixing your asm to use section .data and .text, readelf shows us:
$ readelf -a foo # fixed version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000e7 0x00000000000000e7 R E 0x200000
LOAD 0x00000000000000e8 0x00000000006000e8 0x00000000006000e8
0x0000000000000010 0x0000000000000010 RW 0x200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
Notice how segment 00 is R + E without W, and the .text section is in there. Segment 01 is RW (read + write) without exec, and the .data section is there.
The LOAD tag means they're mapped into the process's virtual address space. Some section (like debug info) aren't, and are just metadata for other tools. But NASM flags unknown section names as progbits, i.e. loaded, which is why it was able to link and have the load not segfault.
After fixing it to use section .data, your program runs without segfaulting.
The loop runs for one iteration, because the 2 bytes following step: dw 1 are not zero. After the dword load, RAX = 0x2c0001 on my system. (cmp between 0x002c0002 and 0xa makes the LE condition false because it's not less or equal.)
dw means "data word" or "define word". Use dd for a data dword.
BTW, there's no need to keep your loop counter in memory. You're not using RDI, RSI, RBP, or R8..R15 for anything so you could just keep it in a register. Like mov edi, limit before the loop, and dec edi / jnz at the bottom.
But actually you should use the 64-bit syscall ABI if you want to build 64-bit code, not the 32-bit int 0x80 ABI. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?. Or build 32-bit executables if you're following a guide or tutorial written for that.
Anyway, in that case you'd be able to use ebx as your loop counter, because the syscall ABI uses different args for registers.
I am testing use of .bss for allocation of a memory area to hold a single number. Then print that number to console. The output is not as expected. I am supposed to get e number (12), but get a newline.
System config:
$ uname -a
Linux 5.8.0-48-generic #54~20.04.1-Ubuntu SMP Sat Mar 20 13:40:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
description: CPU
product: Intel(R) Core(TM) i5-8350U CPU # 1.70GHz
The code:
# compile with: gcc -ggdb -nostdlib -no-pie test.s -o test
.bss
.lcomm output,1
.global _start
.text
_start:
# test .bss and move numer 12 to rbx where memory are allocated in .bss
mov $output, %rbx # rbx to hold address of allocated space
mov $12,%rdx # Move a number to rdx
mov %rdx,(%rbx) # Move content in rdx to the address where rbx points to (e.g ->output)
# setup for write syscall:
mov $1,%rax # system call for write, according to syscall table (http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/)
mov $1,%rdi # fd = 1, stdout
mov $output,%rsi # adress of string to output moved to rsi
mov $1,%rdx # number of bytes to be written
syscall # should write 12 in console
mov $60,%rax
xor %rdi,%rdi
syscall # exit normally
I have set a breakpoint with the first syscall (using GDB), to look into the registers:
i r rax rbx rdx rdi rsi
rax 0x1 1
rbx 0x402000 4202496
rdx 0x1 1
rdi 0x1 1
rsi 0x402000 4202496
x/1 0x402000
0x402000 <output>: 12
The output after syscall is blank, was expected to get the number "12":
:~/Dokumenter/ASM/dec$ gcc -ggdb -nostdlib -no-pie test.s -o test
:~/Dokumenter/ASM/dec$ ./test
:~/Dokumenter/ASM/dec$ ./test
:~/Dokumenter/ASM/dec$
So, my question is, are there any obvious explanation of why I am getting blank and not 12 ?
mov $output,%rsi # address of string to output moved to rsi
^^^^^^
Address of string. The value $12 is not the character sequence "12". If you wanted to print the string 12, you would need to load 0x31 and 0x32 ('1' and '2') into the memory area (making it big enough) the use 2 as the length.
For example, movw $0x3231, output or better movw $0x3231, output(%rip) to use RIP-relative addressing for static data, like normal for x86-64. (Unlike NASM, GAS syntax doesn't $'12' as a way to write the same integer constant.)
If you want to print an integer as a string, you'll probably want to manipulate it mathematically so you can do it one digit at a time. (Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf)
I'm experimenting with assembly language and wrote a program which prints 2 hardcoded bytes into stdout. Here it is:
section .text
global _start
_start:
mov eax, 0x0A31
mov [val], eax
mov eax, 4
mov ebx, 1
mov ecx, val
mov edx, 2
int 0x80
mov eax, 1
int 0x80
segment .bss
val resb 1; <------ Here
Note that I reserved only 1 byte inside the bss segment, but actually put 2 bytes (charcode for 1 and newline symbol) into the memory location. And the program worked fine. It printed 1 character and then newline.
But I expected segmentation fault. Why isn't it occured. We reserved only 1 byte, but put 2.
x86, like most other modern architectures, uses paging / virtual memory for memory protection. On x86 (again like many other architectures), the granularity is 4kiB.
A 4-byte store to val won't fault unless the linker happens to place it in the last 3 bytes of a page, and the next page is unmapped.
What actually happens is that you just overwrite whatever is after val. In this case, it's just unused space to the end of the page. If you had other static storage locations in the BSS, you'd step on their values. (Call them "variables" if you want, but the high-level concept of a "variable" doesn't just mean a memory location, a variable can be live in a register and never needs to have an address.)
Besides the wikipedia article linked above, see also:
How does x86 paging work? (internals of the page-table format, and how the OS manages it and the CPU reads it).
What is the state of the art in Memory Protection?
Is it safe to read past the end of a buffer within the same page on x86 and x64?
About the memory layout of programs in Linux
but actually put 2 bytes (charcode for 1 and newline symbol) into the memory location.
mov [val], eax is a 4-byte store. The operand-size is determined by the register. If you wanted to do a 2-byte store, use mov [val], ax.
Fun fact: MASM would warn or error about an operand-size mismatch, because it magically associates sizes with symbol names based on the declaration that reserves space after them. NASM stays out of your way, so if you wrote mov [val], 0x0A31, it would be an error. Neither operand implies a size, so you need mov dword [val], 0x0A31 (or word or byte).
Placing val at the end of a page to get a segfault
The BSS for some reason doesn't start at the beginning of a page in a 32-bit binary, but it is near the start of a page. You're not linking with anything else that would use up most of a page in the BSS. nm bss-no-segfault shows that it's at 0x080490a8, and a 4k page is 0x1000 bytes, so the last byte in the BSS mapping will be 0x08049fff.
It seems that the BSS start address changes when I add an instruction to the .text section, so presumably the linker's choices here are related to packing things into an ELF executable. It doesn't make much sense, because the BSS isn't stored in the file, it's just a base address + length. I'm not going down that rabbit hole; I'm sure there's a reason that making .text slightly larger results in a BSS that starts at the beginning of a page, but IDK what it is.
Anyway, if we construct the BSS so that val is right before the end of a page, we can get a fault:
... same .text
section .bss
dummy: resb 4096 - 0xa8 - 2
val: resb 1
;; could have done this instead of making up constants
;; ALIGN 4096
;; dummy2: resb 4094
;; val2: resb
Then build and run:
$ asm-link -m32 bss-no-segfault.asm
+ yasm -felf32 -Worphan-labels -gdwarf2 bss-no-segfault.asm
+ ld -melf_i386 -o bss-no-segfault bss-no-segfault.o
peter#volta:~/src/SO$ nm bss-no-segfault
080490a7 B __bss_start
080490a8 b dummy
080490a7 B _edata
0804a000 B _end <--------- End of the BSS
08048080 T _start
08049ffe b val <--------- Address of val
gdb ./bss-no-segfault
(gdb) b _start
(gdb) r
(gdb) set disassembly-flavor intel
(gdb) layout reg
(gdb) p &val
$2 = (<data variable, no debug info> *) 0x8049ffe
(gdb) si # and press return to repeat a couple times
mov [var], eax segfaults because it crosses into the unmapped page. mov [var], ax would works (because I put var 2 bytes before the end of the page).
At this point, /proc/<PID>/smaps shows:
... the r-x private mapping for .text
08049000-0804a000 rwxp 00000000 00:15 2885598 /home/peter/src/SO/bss-no-segfault
Size: 4 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Anonymous: 4 kB
...
[vvar] and [vdso] pages exported by the kernel for fast gettimeofday / getpid
Key things: rwxp means read/write/execute, and private. Even stopped before the first instruction, somehow it's already "dirty" (i.e. written to). So is the text segment, but that's expected from gdb changing the instruction to int3.
The 08049000-0804a000 (and 4 kB size of the mapping) shows us that the BSS only has 1 page mapped. There's no data segment, just text and BSS.
I'm new in asm, and trying to use some opcodes for getting my hands on it.
I'm working on linux, 64 bits, and have allways a segmentation fault when using movsb. I compile with nasm:
nasm -f elf64 test.asm
Here is the code
DEFAULT ABS
segment data
data:
texte: db 'Hello, World !!', 10, 13
len: equ $-texte
texteBis: db 'Hello, World !.', 10, 13
segment code
global main
main:
;The problem is here
mov rsi, texteBis
mov rdi, texte
mov cx, len
rep movsb
mov dx, len
mov rcx, texte
mov bx, 1
mov ax, 4
int 0x80
mov bx,0 ; exit code, 0=normal
mov ax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel
Other question, with string (or other large db instance), should I use
mov rsi, texte
or
mov rsi, [texte]
I didn't understand which one gives the value and which one the address.
Do you also link?
ld -e main test.o -o test
Anyways, texteBis seems to be static data, in the data segment. That page is read-only and protected for writing/execution.
You should allocate a buffer (either on the stack or on the heap if you are allowed to use a runtime library).
Your problem is that you are writing to write-protected memory, i.e. the DATA section. Once your program gets loaded into the memory, the DATA section is actually on a read-only page. You have to use stack memory (or dynamically allocated memory) and use that as the destination of your string copy.
Example:
sub rsp, len ; move stack pointer down 'len' bytes
mov rsi, texteBis
mov rdi, rsp ; use address of stack pointer as dest.
xor rcx,rcx ; cx = 0
mov cx, len
rep movsb
That should fix your problem. As in C, it is important to allocate enough space or you will overwrite data on the stack.
Assigning values to registers
Another thing that I noticed is that you often write to sub-parts of registers, e.g.
mov dx, len
This is dangerous since other parts are not overwritten by this. Only the lowest 16 bit of the register are written. Say rdx, a 64 bit value was set to 0xffffffffffffffff. Then rdx would look like this after your move: 0xffffffffffff0011. The calling code probably reads rdx completely and therefore interprets a length of 0xffffffffffff0011 byte. Not what you want. Solution:
xor rdx,rdx
mov dx, len
or
mov rdx, len
Tools that might help you later
Note, gdb will help you find where your error is happening and will also give you additional information (such as register values and stack values). Excerpt:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005bb in main ()
(gdb) disassemble
Dump of assembler code for function main:
0x00000000004005a6: sub $0x13,%rsp
0x00000000004005aa: mov -0x1c(%rip),%rsi # 0x400595
0x00000000004005b1: mov %rsp,%rdi
0x00000000004005b4: xor %cx,%cx
0x00000000004005b7: mov $0x11,%cx
=> 0x00000000004005bb: rep movsb %ds:(%rsi),%es:(%rdi)
0x00000000004005bd: mov $0x11,%dx
0x00000000004005c1: movabs $0x400584,%rcx
0x00000000004005cb: mov $0x1,%bx
0x00000000004005cf: mov $0x4,%ax
0x00000000004005d3: int $0x80
0x00000000004005d5: mov $0x0,%bx
0x00000000004005d9: mov $0x1,%ax
0x00000000004005dd: int $0x80
End of assembler dump.
(gdb) info registers rsi
rsi 0x57202c6f6c6c6548 6278066737626506568
Since nasm does not support a useful debugging format but it is often the case that you want to break on certain occasions, you can use the int3 instruction to raise a SIGTRAP at a certain point in the code:
mov eax, 10
int3 ; debugger will catch signal here
Hope that helps getting you started in assembly.
You don't need to use dynamic memory. Your data segment or section is read-only because is not an standard section and you are not defining it's attributes and by default nasm assign them as read only data sections.
Using objdump -h with you code outputs the following:
0 data 00000022 0000000000000000 0000000000000000 00000200 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 code 0000003c 0000000000000000 0000000000000000 00000230 2**0
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
If you change the name of your segements to .data and .text the program runs perfectly and objdump outputs:
0 .data 00000022 0000000000000000 0000000000000000 00000200 2**2
CONTENTS, ALLOC, LOAD, DATA
1 .text 0000003c 0000000000000000 0000000000000000 00000230 2**4
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
Which are the correct attributes for what you intend with your sections.
To get more info on what attributes means I recommend this page:
https://www.tortall.net/projects/yasm/manual/html/objfmt-elf-section.html
I am new to asm. I am trying to copy a pointer from a register to a .data variable using NASM, on linux 64-bit.
Concider this program:
section .data
ptr: dq 0
section .text
global _start
_start:
mov [ptr], rsp
mov rax, 60
mov rdi, 0
syscall
Here I try to copy the current stack pointer to ptr. ptr is declared as a quadword. Neither nasm nor the linker complains, but when debugging the program with gdb, I can see that both addresses are different:
gdb ./test.s
+(gdb) break _start
Breakpoint 1 at 0x4000b0
+(gdb) run
Starting program: test
Breakpoint 1, 0x00000000004000b0 in _start ()
+(gdb) nexti
0x00000000004000b8 in _start ()
+(gdb) info registers
...
rsp 0x7fffffffe460 0x7fffffffe460
...
+(gdb) x ptr
0xffffffffffffe460: Cannot access memory at address 0xffffffffffffe460
From what I understand, mov should copy all 64 bits from rsp to [ptr], but it seems that the most significant 0s are not copied and/or that there is some kind of sign extension, as if only the least significant bits were copied.
The problem is, you don't have debug info for the ptr type, so gdb treats it as integer. You can examine its real contents using:
(gdb) x/a &ptr
0x600124 <ptr>: 0x7fffffffe950
(gdb) p/a $rsp
$3 = 0x7fffffffe950
Of course I have a different value for rsp than you, but you can see that ptr and rsp match.
Looks like you're using gdb wrongly to me:
section .data
ptr: dq 0
section .text
global main
main:
mov [ptr], rsp
ret
Compiling with:
rm -f test.o && nasm -f elf64 test.asm && gcc -m64 -o test test.o
Then my debugging session looks like this:
gdb ./test
(...)
(gdb) break main
Breakpoint 1 at 0x4004c0
(gdb) run
Starting program: /home/rr-/test
Breakpoint 1, 0x00000000004004c0 in main ()
(gdb) nexti
0x00000000004004c8 in main ()
(gdb) info registers
rax 0x4004c0 4195520
rbx 0x0 0
rcx 0x0 0
rdx 0x7fffffffe388 140737488348040
rsi 0x7fffffffe378 140737488348024
rdi 0x1 1
rbp 0x4004d0 0x4004d0 <__libc_csu_init>
rsp 0x7fffffffe298 0x7fffffffe298
(...)
(gdb) info addr ptr
Symbol "ptr" is at 0x600880 in a file compiled without debugging.
(gdb) x/g 0x600880
0x600880: 140737488347800
140737488347800 evaluates to 0x7FFFFFFFE298 just fine.
+(gdb) x/h ptr
h means half-word, which is two bytes. What you want is probably g (Giant words in GDB terminology, which is eight bytes).