GNU as equivalent to times X db 0 - nasm

I'm writing a small bootloader in GNU as and I need to make the binary output "BIOS-compatible". Here is how I do it in nasm:
...
times 510 - ($-$$) db 0
dw 0xAA55
But how can I do it in GNU as?

After some Google-searching, I figured out how to do it:
_start:
...
.fill 510 - (. - _start), 0
.word 0xAA55

Related

Trivial assembler program segfaults [duplicate]

I am trying to learn nasm. I want to make a program that prints "Hello, world." n times (in this case 10). I am trying to save the loop register value in a constant so that it is not changed when the body of the loop is executed. When I try to do this I receive a segmentation fault error. I am not sure why this is happening.
My code:
SECTION .DATA
print_str: db 'Hello, world.', 10
print_str_len: equ $-print_str
limit: equ 10
step: dw 1
SECTION .TEXT
GLOBAL _start
_start:
mov eax, 4 ; 'write' system call = 4
mov ebx, 1 ; file descriptor 1 = STDOUT
mov ecx, print_str ; string to write
mov edx, print_str_len ; length of string to write
int 80h ; call the kernel
mov eax, [step] ; moves the step value to eax
inc eax ; Increment
mov [step], eax ; moves the eax value to step
cmp eax, limit ; Compare sil to the limit
jle _start ; Loop while less or equal
exit:
mov eax, 1 ; 'exit' system call
mov ebx, 0 ; exit with error code 0
int 80h ; call the kernel
The result:
Hello, world.
Segmentation fault (core dumped)
The cmd:
nasm -f elf64 file.asm -o file.o
ld file.o -o file
./file
section .DATA is the direct cause of the crash. Lower-case section .data is special, and linked as a read-write (private) mapping of the executable. Section names are case-sensitive.
Upper-case .DATA is not special for nasm or the linker, and it ends up as part of the text segment mapped read+exec without write permission.
Upper-case .TEXT is also weird: by default objdump -drwC -Mintel only disassembles the .text section (to avoid disassembling data as if it were code), so it shows empty output for your executable.
On newer systems, the default for a section name NASM doesn't recognize doesn't include exec permission, so code in .TEXT will segfault. Same as Assembly section .code and .text behave differently
After starting the program under GDB (gdb ./foo, starti), I looked at the process's memory map from another shell.
$ cat /proc/11343/maps
00400000-00401000 r-xp 00000000 00:31 110651257 /tmp/foo
7ffff7ffa000-7ffff7ffd000 r--p 00000000 00:00 0 [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso]
7ffffffde000-7ffffffff000 rwxp 00000000 00:00 0 [stack]
As you can see, other than the special VDSO mappings and the stack, there's only the one file-backed mapping, and it has read+exec permission only.
Single-stepping inside GDB, the mov eax,DWORD PTR ds:0x400086 load succeeds, but the mov DWORD PTR ds:0x400086,eax store faults. (See the bottom of the x86 tag wiki for GDB asm tips.)
From readelf -a foo, we can see the ELF program headers that tell the OS's program loader how to map it into memory:
$ readelf -a foo # broken version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000bf 0x00000000000000bf R 0x200000
Section to Segment mapping:
Segment Sections...
00 .DATA .TEXT
Notice how both .DATA and .TEXT are in the same segment. This is what you'd want for section .rodata (a standard section name where you should put read-only constant data like your string), but it won't work for mutable global variables.
After fixing your asm to use section .data and .text, readelf shows us:
$ readelf -a foo # fixed version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000e7 0x00000000000000e7 R E 0x200000
LOAD 0x00000000000000e8 0x00000000006000e8 0x00000000006000e8
0x0000000000000010 0x0000000000000010 RW 0x200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
Notice how segment 00 is R + E without W, and the .text section is in there. Segment 01 is RW (read + write) without exec, and the .data section is there.
The LOAD tag means they're mapped into the process's virtual address space. Some section (like debug info) aren't, and are just metadata for other tools. But NASM flags unknown section names as progbits, i.e. loaded, which is why it was able to link and have the load not segfault.
After fixing it to use section .data, your program runs without segfaulting.
The loop runs for one iteration, because the 2 bytes following step: dw 1 are not zero. After the dword load, RAX = 0x2c0001 on my system. (cmp between 0x002c0002 and 0xa makes the LE condition false because it's not less or equal.)
dw means "data word" or "define word". Use dd for a data dword.
BTW, there's no need to keep your loop counter in memory. You're not using RDI, RSI, RBP, or R8..R15 for anything so you could just keep it in a register. Like mov edi, limit before the loop, and dec edi / jnz at the bottom.
But actually you should use the 64-bit syscall ABI if you want to build 64-bit code, not the 32-bit int 0x80 ABI. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?. Or build 32-bit executables if you're following a guide or tutorial written for that.
Anyway, in that case you'd be able to use ebx as your loop counter, because the syscall ABI uses different args for registers.

why not getting SIGSEGV when overflowing .data section [duplicate]

I'm experimenting with assembly language and wrote a program which prints 2 hardcoded bytes into stdout. Here it is:
section .text
global _start
_start:
mov eax, 0x0A31
mov [val], eax
mov eax, 4
mov ebx, 1
mov ecx, val
mov edx, 2
int 0x80
mov eax, 1
int 0x80
segment .bss
val resb 1; <------ Here
Note that I reserved only 1 byte inside the bss segment, but actually put 2 bytes (charcode for 1 and newline symbol) into the memory location. And the program worked fine. It printed 1 character and then newline.
But I expected segmentation fault. Why isn't it occured. We reserved only 1 byte, but put 2.
x86, like most other modern architectures, uses paging / virtual memory for memory protection. On x86 (again like many other architectures), the granularity is 4kiB.
A 4-byte store to val won't fault unless the linker happens to place it in the last 3 bytes of a page, and the next page is unmapped.
What actually happens is that you just overwrite whatever is after val. In this case, it's just unused space to the end of the page. If you had other static storage locations in the BSS, you'd step on their values. (Call them "variables" if you want, but the high-level concept of a "variable" doesn't just mean a memory location, a variable can be live in a register and never needs to have an address.)
Besides the wikipedia article linked above, see also:
How does x86 paging work? (internals of the page-table format, and how the OS manages it and the CPU reads it).
What is the state of the art in Memory Protection?
Is it safe to read past the end of a buffer within the same page on x86 and x64?
About the memory layout of programs in Linux
but actually put 2 bytes (charcode for 1 and newline symbol) into the memory location.
mov [val], eax is a 4-byte store. The operand-size is determined by the register. If you wanted to do a 2-byte store, use mov [val], ax.
Fun fact: MASM would warn or error about an operand-size mismatch, because it magically associates sizes with symbol names based on the declaration that reserves space after them. NASM stays out of your way, so if you wrote mov [val], 0x0A31, it would be an error. Neither operand implies a size, so you need mov dword [val], 0x0A31 (or word or byte).
Placing val at the end of a page to get a segfault
The BSS for some reason doesn't start at the beginning of a page in a 32-bit binary, but it is near the start of a page. You're not linking with anything else that would use up most of a page in the BSS. nm bss-no-segfault shows that it's at 0x080490a8, and a 4k page is 0x1000 bytes, so the last byte in the BSS mapping will be 0x08049fff.
It seems that the BSS start address changes when I add an instruction to the .text section, so presumably the linker's choices here are related to packing things into an ELF executable. It doesn't make much sense, because the BSS isn't stored in the file, it's just a base address + length. I'm not going down that rabbit hole; I'm sure there's a reason that making .text slightly larger results in a BSS that starts at the beginning of a page, but IDK what it is.
Anyway, if we construct the BSS so that val is right before the end of a page, we can get a fault:
... same .text
section .bss
dummy: resb 4096 - 0xa8 - 2
val: resb 1
;; could have done this instead of making up constants
;; ALIGN 4096
;; dummy2: resb 4094
;; val2: resb
Then build and run:
$ asm-link -m32 bss-no-segfault.asm
+ yasm -felf32 -Worphan-labels -gdwarf2 bss-no-segfault.asm
+ ld -melf_i386 -o bss-no-segfault bss-no-segfault.o
peter#volta:~/src/SO$ nm bss-no-segfault
080490a7 B __bss_start
080490a8 b dummy
080490a7 B _edata
0804a000 B _end <--------- End of the BSS
08048080 T _start
08049ffe b val <--------- Address of val
gdb ./bss-no-segfault
(gdb) b _start
(gdb) r
(gdb) set disassembly-flavor intel
(gdb) layout reg
(gdb) p &val
$2 = (<data variable, no debug info> *) 0x8049ffe
(gdb) si # and press return to repeat a couple times
mov [var], eax segfaults because it crosses into the unmapped page. mov [var], ax would works (because I put var 2 bytes before the end of the page).
At this point, /proc/<PID>/smaps shows:
... the r-x private mapping for .text
08049000-0804a000 rwxp 00000000 00:15 2885598 /home/peter/src/SO/bss-no-segfault
Size: 4 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Anonymous: 4 kB
...
[vvar] and [vdso] pages exported by the kernel for fast gettimeofday / getpid
Key things: rwxp means read/write/execute, and private. Even stopped before the first instruction, somehow it's already "dirty" (i.e. written to). So is the text segment, but that's expected from gdb changing the instruction to int3.
The 08049000-0804a000 (and 4 kB size of the mapping) shows us that the BSS only has 1 page mapped. There's no data segment, just text and BSS.

Segmentation fault with a variable in SECTION .DATA

I am trying to learn nasm. I want to make a program that prints "Hello, world." n times (in this case 10). I am trying to save the loop register value in a constant so that it is not changed when the body of the loop is executed. When I try to do this I receive a segmentation fault error. I am not sure why this is happening.
My code:
SECTION .DATA
print_str: db 'Hello, world.', 10
print_str_len: equ $-print_str
limit: equ 10
step: dw 1
SECTION .TEXT
GLOBAL _start
_start:
mov eax, 4 ; 'write' system call = 4
mov ebx, 1 ; file descriptor 1 = STDOUT
mov ecx, print_str ; string to write
mov edx, print_str_len ; length of string to write
int 80h ; call the kernel
mov eax, [step] ; moves the step value to eax
inc eax ; Increment
mov [step], eax ; moves the eax value to step
cmp eax, limit ; Compare sil to the limit
jle _start ; Loop while less or equal
exit:
mov eax, 1 ; 'exit' system call
mov ebx, 0 ; exit with error code 0
int 80h ; call the kernel
The result:
Hello, world.
Segmentation fault (core dumped)
The cmd:
nasm -f elf64 file.asm -o file.o
ld file.o -o file
./file
section .DATA is the direct cause of the crash. Lower-case section .data is special, and linked as a read-write (private) mapping of the executable. Section names are case-sensitive.
Upper-case .DATA is not special for nasm or the linker, and it ends up as part of the text segment mapped read+exec without write permission.
Upper-case .TEXT is also weird: by default objdump -drwC -Mintel only disassembles the .text section (to avoid disassembling data as if it were code), so it shows empty output for your executable.
On newer systems, the default for a section name NASM doesn't recognize doesn't include exec permission, so code in .TEXT will segfault. Same as Assembly section .code and .text behave differently
After starting the program under GDB (gdb ./foo, starti), I looked at the process's memory map from another shell.
$ cat /proc/11343/maps
00400000-00401000 r-xp 00000000 00:31 110651257 /tmp/foo
7ffff7ffa000-7ffff7ffd000 r--p 00000000 00:00 0 [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso]
7ffffffde000-7ffffffff000 rwxp 00000000 00:00 0 [stack]
As you can see, other than the special VDSO mappings and the stack, there's only the one file-backed mapping, and it has read+exec permission only.
Single-stepping inside GDB, the mov eax,DWORD PTR ds:0x400086 load succeeds, but the mov DWORD PTR ds:0x400086,eax store faults. (See the bottom of the x86 tag wiki for GDB asm tips.)
From readelf -a foo, we can see the ELF program headers that tell the OS's program loader how to map it into memory:
$ readelf -a foo # broken version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000bf 0x00000000000000bf R 0x200000
Section to Segment mapping:
Segment Sections...
00 .DATA .TEXT
Notice how both .DATA and .TEXT are in the same segment. This is what you'd want for section .rodata (a standard section name where you should put read-only constant data like your string), but it won't work for mutable global variables.
After fixing your asm to use section .data and .text, readelf shows us:
$ readelf -a foo # fixed version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000e7 0x00000000000000e7 R E 0x200000
LOAD 0x00000000000000e8 0x00000000006000e8 0x00000000006000e8
0x0000000000000010 0x0000000000000010 RW 0x200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
Notice how segment 00 is R + E without W, and the .text section is in there. Segment 01 is RW (read + write) without exec, and the .data section is there.
The LOAD tag means they're mapped into the process's virtual address space. Some section (like debug info) aren't, and are just metadata for other tools. But NASM flags unknown section names as progbits, i.e. loaded, which is why it was able to link and have the load not segfault.
After fixing it to use section .data, your program runs without segfaulting.
The loop runs for one iteration, because the 2 bytes following step: dw 1 are not zero. After the dword load, RAX = 0x2c0001 on my system. (cmp between 0x002c0002 and 0xa makes the LE condition false because it's not less or equal.)
dw means "data word" or "define word". Use dd for a data dword.
BTW, there's no need to keep your loop counter in memory. You're not using RDI, RSI, RBP, or R8..R15 for anything so you could just keep it in a register. Like mov edi, limit before the loop, and dec edi / jnz at the bottom.
But actually you should use the 64-bit syscall ABI if you want to build 64-bit code, not the 32-bit int 0x80 ABI. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?. Or build 32-bit executables if you're following a guide or tutorial written for that.
Anyway, in that case you'd be able to use ebx as your loop counter, because the syscall ABI uses different args for registers.

Adding 64 bit asm instructions to 32bit asm code

I am planning to write the smallest binary file in terms of size in binary. I am following the instructions as given here. I have successfully managed to reduce the size of my binary using 32 bit asm to 64 bytes. Now I want to convert the 32 bit assembly language to 64 bit language.
This is the code that I tried:
; tiny.asm
BITS 32
org 0x00200000
db 0x7F, "ELF" ; e_ident
db 1, 1, 1, 0
_start:
mov bl, 42 ;These are the lines where
xor eax, eax ;I want to insert the 64 bit
inc eax ;instructions.
int 0x80 ;
db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
phdr: dd 1 ; e_shoff ; p_type
dd 0 ; e_flags ; p_offset
dd $$ ; e_ehsize ; p_vaddr
; e_phentsize
dw 1 ; e_phnum ; p_paddr
dw 0 ; e_shentsize
dd filesize ; e_shnum ; p_filesz
; e_shstrndx
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
filesize equ $ - $$
As per the code I did have some wild guesses. However, the compiler is expecting 32 bit instructions when I tried to tinker with it giving registers and syscall of 64 bit asm.
The code I used to execute the program is :
$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42
$ wc -c a.out
64 a.out
Any idea or any sort of lead in this will be really appreciated. Thanks.

Has anyone been able to create a hybrid of PE COFF and ELF?

I mean could a single binary file run in both Win32 and Linux i386 ?
This is not possible, because the two types have conflicting formats:
The initial two characters of a PE file must be 'M' 'Z';
The initial four characters of an ELF file must be '\x7f' 'E' 'L' 'F'.
Clearly, you can't create one file that satisifies both formats.
In response to the comment about a polyglot binary valid as both a 16 bit COM file and a Linux ELF file, that's possible (although really a COM file is a DOS program, not Windows - and certainly not Win32).
Here's one I knocked together - compile it with NASM. It works because the first two bytes of an ELF file ('\x7f' 'E') happen to also be valid 8086 machine code (a 45 byte relative jump-if-greater-than instruction). Minimal ELF headers cribbed from Brian Raiter.
BITS 32
ORG 0x08048000
ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
times 0x47-($-$$) db 0
; DOS COM File code
BITS 16
mov dx, msg1 - $$ + 0x100
mov ah, 0x09
int 0x21
mov ah, 0x00
int 0x21
msg1: db `Hello World (DOS).\r\n$`
BITS 32
phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr
; Linux ELF code
_start:
mov eax, 4 ; SYS_write
mov ebx, 1 ; stdout
mov ecx, msg2
mov edx, msg2_len
int 0x80
mov eax, 1 ; SYS_exit
mov ebx, 0
int 0x80
msg2: db `Hello World (Linux).\n`
msg2_len equ $ - msg2
filesize equ $ - $$
The two formats are sufficiently different that a hybrid is unlikely.
However, Linux supports loading different executable formats by "interpreter". This way compiled .exe files containing CIL (compiled C# or other .NET languages) can be executed directly under Linux, for example.
Sure. Use Java.

Resources