I've been writing ELF binaries using NASM, and I created a segment with the read-only flag turned on. Running the program causes a segfault. I tested the program in replit, and it ran just fine so what's the problem? I created a regular NASM hello world program with the hello world string inside the .rodata section and that ran fine. I checked the binary with readelf to make sure the string was in a read only segment.
The only solution I've come up with is to set the executable flag in the rodata segment so it has read / execute permissions, but that's hacky and I'd like the rodata segment to be read-only.
This is the code for the ELF-64 hello world.
; hello.asm
[bits 64]
[org 0x400000]
fileHeader:
db 0x7F, "ELF"
db 2 ; ELF-64
db 1 ; little endian
db 1 ; ELF version
db 0 ; System V ABI
db 0 ; ABI version
db 0, 0, 0, 0, 0, 0, 0 ; unused
dw 2 ; executable object file
dw 0x3E ; x86-64
dd 1 ; ELF version
dq text ; entry point
dq 64 ; program header table offset
dq nullSection - $$ ; section header table offset
dd 0 ; flags
dw 64 ; size of file header
dw 56 ; size of program header
dw 3 ; program header count
dw 64 ; size of section header
dw 4 ; section header count
dw 3 ; section header string table index
nullSegment:
times 56 db 0
textSegment:
dd 1 ; loadable segment
dd 0x4 ; read / execute permissions
dq text - $$ ; segment offset
dq text ; virtual address of segment
dq 0 ; physical address of segment
dq textSize ; size of segment in file
dq textSize ; size of segment in memory
dq 0x1000 ; alignment
rodataSegment:
dd 1 ; loadable segment
dd 0x4 ; read permission (setting this flag to 0x5 causes the program to run just fine)
dq rodata - $$ ; segment offset
dq rodata ; virtual address of segment
dq 0 ; physical address of segment
dq rodataSize ; size of segment in file
dq rodataSize ; size of segment in memory
dq 0x1000 ; alignment
text:
mov rax, 1
mov rdi, 1
mov rsi, message
mov rdx, messageLength
syscall
mov rax, 60
xor rdi, rdi
syscall
textSize equ $ - text
rodata:
message db "Hello world!", 0xA, 0
messageLength equ $ - message
rodataSize equ $ - rodata
stringTable:
db 0
db ".text", 0
db ".rodata", 0
db ".shstrtab", 0
stringTableSize equ $ - stringTable
nullSection:
times 64 db 0
textSection:
dd 1 ; index into string table
dd 1 ; program data
dq 0x6 ; occupies memory & executable
dq text ; virtual address of section
dq text - $$ ; offset of section in file
dq textSize ; size of section in file
dq 0 ; unused
dq 0x1000 ; alignment
dq 0 ; unused
rodataSection:
dd 7 ; index into string table
dd 1 ; program data
dq 0x2 ; occupies memory
dq rodata ; virtual address of section
dq rodata - $$ ; offset of section in file
dq rodataSize ; size of section in file
dq 0 ; unused
dq 0x1000 ; no alignment
dq 0 ; unused
stringTableSection:
dd 15 ; index into string table
dd 3 ; string table
dq 0 ; no attributes
dq stringTable ; virtual address of section
dq stringTable - $$ ; offset of section in file
dq stringTableSize ; size of section in file
dq 0 ; unused
dq 0 ; no alignment
dq 0 ; unused
replitHello.asm: https://hastebin.com/ujanoguveq.properties // it should be nearly the same line for line
This is the minimal nasm hello world program.
; helloNasm.asm
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, message
mov rdx, messageLength
syscall
mov rax, 60
xor rdi, rdi
syscall
section .rodata
message db "Hello NASM!", 0xA, 0
messageLength equ $ - message
textSegment:
dd 1 ; loadable segment
dd 0x4 ; read / execute permissions
I assume you meant 0x5 for flags above.
With that fixed, I see the following segments:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
NULL 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 0
LOAD 0x0000e8 0x00000000004000e8 0x0000000000000000 0x000025 0x000025 R E 0x1000
LOAD 0x00010d 0x000000000040010d 0x0000000000000000 0x00000e 0x00000e R 0x1000
This asks the kernel to perform two mmaps at the same address (0x400000). The second of these mmaps maps over the first one, resulting in the following /proc/$pid/maps:
00400000-00401000 r--p 00000000 fe:02 22548440 /tmp/t
7ffff7ff9000-7ffff7ffd000 r--p 00000000 00:00 0 [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso]
7ffffffdd000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
As you can see, the program text is not executable, and as a result the program SIGSEGVs on the very first instruction:
(gdb) run
Starting program: /tmp/t
Program received signal SIGSEGV, Segmentation fault.
0x00000000004000e8 in ?? ()
(gdb) x/i $pc
=> 0x4000e8: mov $0x1,%eax
To fix this, you must move one of the segments to a different page (as Jester correctly noted).
Also note that sections are completely unnecessary (only segments matter). Setting A X flags in the .text section in particular has no effect on anything.
Related
I have the following 'uppercaser.asm' assembly program in NASM which converts all lowercase letters input from user into uppercase:
section .bss
Buff resb 1
section .data
section .text
global _start
_start:
nop ; This no-op keeps the debugger happy
Read: mov eax,3 ; Specify sys_read call
mov ebx,0 ; Specify File Descriptor 0: Standard Input
mov ecx,Buff ; Pass offset of the buffer to read to
mov edx,1 ; Tell sys_read to read one char from stdin
int 80h ; Call sys_read
cmp eax,0 ; Look at sys_read's return value in EAX
je Exit ; Jump If Equal to 0 (0 means EOF) to Exit
; or fall through to test for lowercase
cmp byte [Buff],61h ; Test input char against lowercase 'a'
jb Write ; If below 'a' in ASCII chart, not lowercase
cmp byte [Buff],7Ah ; Test input char against lowercase 'z'
ja Write ; If above 'z' in ASCII chart, not lowercase
; At this point, we have a lowercase character
sub byte [Buff],20h ; Subtract 20h from lowercase to give uppercase...
; ...and then write out the char to stdout
Write: mov eax,4 ; Specify sys_write call
mov ebx,1 ; Specify File Descriptor 1: Standard output
mov ecx,Buff ; Pass address of the character to write
mov edx,1 ; Pass number of chars to write
int 80h ; Call sys_write...
jmp Read ; ...then go to the beginning to get another character
Exit: mov eax,1 ; Code for Exit Syscall
mov ebx,0 ; Return a code of zero to Linux
int 80H ; Make kernel call to exit program
The program is then assembled with the -g -F stabs option for the debugger and linked for 32-bit executables in ubuntu 18.04.
Running readelf --segments uppercaser for the segments and readelf -S uppercaser for the sections I see a difference in size of text segment and text section.
readelf --segments uppercaser
Elf file type is EXEC (Executable file)
Entry point 0x8048080
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x000db 0x000db R E 0x1000
LOAD 0x0000dc 0x080490dc 0x080490dc 0x00000 0x00004 RW 0x1000
Section to Segment mapping:
Segment Sections...
00 .text
01 .bss
readelf -S uppercaser
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 08048080 000080 00005b 00 AX 0 0 16
[ 2] .bss NOBITS 080490dc 0000dc 000004 00 WA 0 0 4
[ 3] .stab PROGBITS 00000000 0000dc 000120 0c 4 0 4
[ 4] .stabstr STRTAB 00000000 0001fc 000011 00 0 0 1
[ 5] .comment PROGBITS 00000000 00020d 00001f 00 0 0 1
[ 6] .shstrtab STRTAB 00000000 00022c 00003e 00 0 0 1
[ 7] .symtab SYMTAB 00000000 0003d4 0000f0 10 8 11 4
[ 8] .strtab STRTAB 00000000 0004c4 000045 00 0 0 1
In the sections description one can see that the size of .text section is 5Bh=91 bytes (the same number one is getting with the size command) whereas in the segments description we see that the size is 0x000DB, a difference of 128 bytes. Why is that?
From the elf man pages for the Elf32_Phdr (program header) structure:
p_filesz
This member holds the number of bytes in the file image of
the segment. It may be zero.
p_memsz
This member holds the number of bytes in the memory image
of the segment. It may be zero.
Is the difference somehow related to the .bss section?
Notice that the first program segment at file address 0 starts at virtual address 0x08048000, not at VA 0x08048080 which corresponds with the .text section.
In fact the segment displayed by readelf as 00 .text covers ELF file header (52 bytes), alignment, two program headers (2*32 bytes) and the netto contents of .text section, alltogether mapped from file address 0 to VA 0x08048000.
I am trying to learn nasm. I want to make a program that prints "Hello, world." n times (in this case 10). I am trying to save the loop register value in a constant so that it is not changed when the body of the loop is executed. When I try to do this I receive a segmentation fault error. I am not sure why this is happening.
My code:
SECTION .DATA
print_str: db 'Hello, world.', 10
print_str_len: equ $-print_str
limit: equ 10
step: dw 1
SECTION .TEXT
GLOBAL _start
_start:
mov eax, 4 ; 'write' system call = 4
mov ebx, 1 ; file descriptor 1 = STDOUT
mov ecx, print_str ; string to write
mov edx, print_str_len ; length of string to write
int 80h ; call the kernel
mov eax, [step] ; moves the step value to eax
inc eax ; Increment
mov [step], eax ; moves the eax value to step
cmp eax, limit ; Compare sil to the limit
jle _start ; Loop while less or equal
exit:
mov eax, 1 ; 'exit' system call
mov ebx, 0 ; exit with error code 0
int 80h ; call the kernel
The result:
Hello, world.
Segmentation fault (core dumped)
The cmd:
nasm -f elf64 file.asm -o file.o
ld file.o -o file
./file
section .DATA is the direct cause of the crash. Lower-case section .data is special, and linked as a read-write (private) mapping of the executable. Section names are case-sensitive.
Upper-case .DATA is not special for nasm or the linker, and it ends up as part of the text segment mapped read+exec without write permission.
Upper-case .TEXT is also weird: by default objdump -drwC -Mintel only disassembles the .text section (to avoid disassembling data as if it were code), so it shows empty output for your executable.
On newer systems, the default for a section name NASM doesn't recognize doesn't include exec permission, so code in .TEXT will segfault. Same as Assembly section .code and .text behave differently
After starting the program under GDB (gdb ./foo, starti), I looked at the process's memory map from another shell.
$ cat /proc/11343/maps
00400000-00401000 r-xp 00000000 00:31 110651257 /tmp/foo
7ffff7ffa000-7ffff7ffd000 r--p 00000000 00:00 0 [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso]
7ffffffde000-7ffffffff000 rwxp 00000000 00:00 0 [stack]
As you can see, other than the special VDSO mappings and the stack, there's only the one file-backed mapping, and it has read+exec permission only.
Single-stepping inside GDB, the mov eax,DWORD PTR ds:0x400086 load succeeds, but the mov DWORD PTR ds:0x400086,eax store faults. (See the bottom of the x86 tag wiki for GDB asm tips.)
From readelf -a foo, we can see the ELF program headers that tell the OS's program loader how to map it into memory:
$ readelf -a foo # broken version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000bf 0x00000000000000bf R 0x200000
Section to Segment mapping:
Segment Sections...
00 .DATA .TEXT
Notice how both .DATA and .TEXT are in the same segment. This is what you'd want for section .rodata (a standard section name where you should put read-only constant data like your string), but it won't work for mutable global variables.
After fixing your asm to use section .data and .text, readelf shows us:
$ readelf -a foo # fixed version
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000e7 0x00000000000000e7 R E 0x200000
LOAD 0x00000000000000e8 0x00000000006000e8 0x00000000006000e8
0x0000000000000010 0x0000000000000010 RW 0x200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
Notice how segment 00 is R + E without W, and the .text section is in there. Segment 01 is RW (read + write) without exec, and the .data section is there.
The LOAD tag means they're mapped into the process's virtual address space. Some section (like debug info) aren't, and are just metadata for other tools. But NASM flags unknown section names as progbits, i.e. loaded, which is why it was able to link and have the load not segfault.
After fixing it to use section .data, your program runs without segfaulting.
The loop runs for one iteration, because the 2 bytes following step: dw 1 are not zero. After the dword load, RAX = 0x2c0001 on my system. (cmp between 0x002c0002 and 0xa makes the LE condition false because it's not less or equal.)
dw means "data word" or "define word". Use dd for a data dword.
BTW, there's no need to keep your loop counter in memory. You're not using RDI, RSI, RBP, or R8..R15 for anything so you could just keep it in a register. Like mov edi, limit before the loop, and dec edi / jnz at the bottom.
But actually you should use the 64-bit syscall ABI if you want to build 64-bit code, not the 32-bit int 0x80 ABI. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?. Or build 32-bit executables if you're following a guide or tutorial written for that.
Anyway, in that case you'd be able to use ebx as your loop counter, because the syscall ABI uses different args for registers.
I have the following bootloader code which seems to run perfectly fine on a hard disk:
[bits 16]
[org 0x7c00]
bootld_start:
KERNEL_OFFSET equ 0x2000
xor ax, ax ; Explicitly set ES = DS = 0
mov ds, ax
mov es, ax
mov bx, 0x8C00 ; Set SS:SP to 0x8C00:0x0000 . The stack will exist
; between 0x8C00:0x0000 and 0x8C00:0xFFFF
mov ss, bx
mov sp, ax
mov [BOOT_DRIVE], dl
mov bx, boot_msg
call print_string
mov dl, [BOOT_DRIVE]
call disk_load
jmp pm_setup
jmp $
BOOT_DRIVE:
db 0
disk_load:
mov si, dap
mov ah, 0x42
int 0x13
;cmp al, 4
;jne disk_error_132
ret
dap:
db 0x10 ; Size of DAP
db 0
; You can only read 46 sectors into memory between 0x2000 and
; 0x7C00. Don't read anymore or we overwrite the bootloader we are
; executing from. (0x7c00-0x2000)/512 = 46
dw 46 ; Number of sectors to read
dw KERNEL_OFFSET ; Offset
dw 0 ; Segment
dd 1
dd 0
disk_error_132:
mov bx, disk_error_132_msg
call print_string
jmp $
disk_error_132_msg:
db 'Error! Error! Something is VERY wrong! (0x132)', 0
gdt_start:
gdt_null:
dd 0x0
dd 0x0
gdt_code:
dw 0xffff
dw 0x0
db 0x0
db 10011010b
db 11001111b
db 0x0
gdt_data:
dw 0xffff
dw 0x0
db 0x0
db 10010010b
db 11001111b
db 0x0
gdt_end:
gdt_descriptor:
dw gdt_end - gdt_start
dd gdt_start
CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start
boot_msg:
db 'OS is booting files... ', 0
done_msg:
db 'Done! ', 0
%include "boot/print_string.asm"
pm_setup:
mov bx, done_msg
call print_string
mov ax, 0
mov ss, ax
mov sp, 0xFFFC
mov ax, 0
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
cli
lgdt[gdt_descriptor]
mov eax, cr0
or eax, 0x1
mov cr0, eax
jmp CODE_SEG:b32
[bits 32]
VIDEO_MEMORY equ 0xb8000
WHITE_ON_BLACK equ 0x0f
print32:
pusha
mov edx, VIDEO_MEMORY
.loop:
mov al, [ebx]
mov ah, WHITE_ON_BLACK
cmp al, 0
je .done
mov [edx], ax
add ebx, 1
add edx, 2
jmp .loop
.done:
popa
ret
b32:
mov ax, DATA_SEG
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
; Place stack below EBDA in lower memory
mov ebp, 0x9c000
mov esp, ebp
mov ebx, pmode_msg
call print32
call KERNEL_OFFSET
jmp $
pmode_msg:
db 'Protected mode enabled!', 0
kernel:
mov ebx, pmode_msg
call print32
jmp $
pmode_tst:
db 'Testing...'
times 510-($-$$) db 0
db 0x55
db 0xAA
The problem is that when I convert it to an ISO with these commands:
mkdir iso
mkdir iso/boot
cp image.flp iso/boot/boot
xorriso -as mkisofs -R -J -c boot/bootcat \
-b boot/boot -no-emul-boot -boot-load-size 4 \
-o image.iso iso
...it fails with a triple fault. When I run it with qemu-system-i386 -boot d -cdrom os-image.iso -m 512 -d int -no-reboot -no-shutdown, it outputs (excluding useless SMM exceptions):
check_exception old: 0xffffffff new 0xd
0: v=0d e=0000 i=0 cpl=0 IP=0008:0000000000006616
pc=0000000000006616
SP=0010:000000000009bff8 env->regs[R_EAX]=0000000000000000
EAX=00000000 EBX=00007d72 ECX=00000000 EDX=000000e0
ESI=00007cb0 EDI=00000010 EBP=0009c000 ESP=0009bff8
EIP=00006616 EFL=00000083 [--S---C] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 00007c73 00000018
IDT= 00000000 000003ff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=000000e0 CCD=000001b3 CCO=ADDB
EFER=0000000000000000
check_exception old: 0xd new 0xd
1: v=08 e=0000 i=0 cpl=0 IP=0008:0000000000006616 pc=0000000000006616 SP=0010:000000000009bff8 env- >regs[R_EAX]=0000000000000000
EAX=00000000 EBX=00007d72 ECX=00000000 EDX=000000e0
ESI=00007cb0 EDI=00000010 EBP=0009c000 ESP=0009bff8
EIP=00006616 EFL=00000083 [--S---C] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 00007c73 00000018
IDT= 00000000 000003ff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=000000e0 CCD=000001b3 CCO=ADDB
EFER=0000000000000000
check_exception old: 0x8 new 0xd
Which means that I got a 0x0d (general protection fault), then a 0x08 (double fault), then it triple faulted. Why is this happening?
EDIT: I have changed the command to:
xorriso -as mkisofs -R -J -c boot/bootcat -b boot/boot.flp -o nmos.iso nmos.flp
But I am now getting the following error:
xorriso : FAILURE : Cannot find in ISO image: -boot_image ... bin_path='/boot/boot.flp'
xorriso : NOTE : -return_with SORRY 32 triggered by problem severity FAILURE
Does anyone know what this means?
EDIT 2:
I have changed the code to read using ah=0x02 like this:
mov bx, KERNEL_OFFSET
mov ah, 0x02
mov al, 46
mov ch, 0x00
mov dh, 0x00
mov cl, 0x02
mov dl, [BOOT_DRIVE]
int 0x13
But it is still triple-faulting. Why?
I am the developer of xorriso. If image.flp is a floppy disk image
with MBR, possibly a partition table, and a filesystem, then the hint
of Michael goes to the right direction. El Torito specifies emulations
which lets the boot image file appear to the BIOS as floppy or hard disk.
The options -no-emul-boot -boot-load-size 4 causes BIOS to load the
first 2048 bytes of file image.flp and to execute them as x86 program.
Obviously a floppy image is not suitable as plain program.
According to mkisofs traditions floppy emulation is the default with
option -b. So you would just have to remove the option -no-emul-boot
from your xorriso command line in order to get the El Torito boot image
as floppy. (-boot-load-size 4 is then obsolete, too.)
The floppy image must have either 2400, or 2880, or 5760 sectors of 512
bytes, or else it will be rejected by xorriso.
Images of other sizes may be emulated as hard disks where the first
(and only) partition entry in the MBR partition table tells the size of
the disk. xorriso -as mkisofs option -hard-disk-boot chooses this emulation.
The primary cause of all the triple faults in your question really come down to the fact that your kernel isn't being loaded properly into the memory at 0x0000:0x2000. When you transfer control to this location with a JMP you end up running what happens to be in the memory region and the CPU executes until it hits an instruction that causes a fault.
Bootable CDs are strange beasts that have a number of different modes, and there are many BIOSes that boot such CDs but they too may have their own quirks. When you use -no-emul-boot with XORRISO you are requesting the disk neither be treated as a floppy nor hard disk. You could remove -no-emul-boot -boot-load-size 4 that should generate an ISO that gets treated as a floppy. The problem with that is many real BIOSes, Emulators (BOCHs and QEMU) and Virtual machines do not support Int 13h/AH=42h extended disk reads when the CD is booted using floppy emulation. You may be forced to use regular disk read via Int 13h/AH=02h.
You should be able to use extended disk reads via Int 13h/AH=42h if you use -no-emul-boot -boot-load-size 4 but it will require some changes to your bootloader. When using -no-emul-boot -boot-load-size 4 CDROMs sector sizes are 2048 bytes, not 512. This will require a bit of modification to your bootloader and kernel. The -boot-load-size 4 writes information to the ISO that informs the BIOS to read 4 512-byte chunks from the beginning of the disk image inside the ISO. The 0xaa55 boot signature is no longer needed.
If you use -no-emul-boot there is one other snag that needs to be dealt with. On the CD-ROM LBA 0 isn't where the disk image gets placed in the final ISO. The question is, how can you get the LBA where the disk image is in the ISO? You can have XORRISO write this information into a special section of the bootloader you create, and you enable this feature with -boot-info-table.
Creating the special section at the beginning of the bootloader is relatively easy. In the El Torito Specification Supplement they mention this:
EL TORITO BOOT INFORMATION TABLE
...
The format of this table is as follows; all integers are in sec-
tion 7.3.1 ("little endian") format.
Offset Name Size Meaning
8 bi_pvd 4 bytes LBA of primary volume descriptor
12 bi_file 4 bytes LBA of boot file
16 bi_length 4 bytes Boot file length in bytes
20 bi_csum 4 bytes 32-bit checksum
24 bi_reserved 40 bytes Reserved
The 32-bit checksum is the sum of all the 32-bit words in the
boot file starting at byte offset 64. All linear block addresses
(LBAs) are given in CD sectors (normally 2048 bytes).
This is talking about the 56 bytes at offset 8 of the virtual disk we create holding our bootloader. If we modify the top of your bootloader code to look like this we effectively create a blank boot information table:
start:
jmp bootld_start
times 8-($-$$) db 0 ; Pad out first 8 bytes
; Boot info table
bi_pvd dd 0
bi_file dd 0
bi_kength dd 0
bi_csum dd 0
bi_reserved times 40 db 0 ; 40 bytes reserved
When using XORRISO with -boot-info-table this table will be filled in once the ISO is generated. bi_file is the important piece of information we will need since it is the LBA where our disk image is placed inside the ISO. We can use this to fill in the Disk Access Packet used by extended disk reads to read from the proper location of the ISO.
To make the DAP a little more readable and to account for 2048 byte sectors I've amended it to look like:
dap:
dap_size: db 0x10 ; Size of DAP
dap_zero db 0
; You can only read 11 2048 byte sectors into memory between 0x2000 and
; 0x7C00. Don't read anymore or we overwrite the bootloader we are
; executing from. (0x7c00-0x2000)/2048 = 11 (rounded down)
dap_numsec: dw 11 ; Number of sectors to read
dap_offset: dw KERNEL_OFFSET ; Offset
dap_segment: dw 0 ; Segment
dap_lba_low: dd 0
dap_lba_high:dd 0
One issue is that the LBA placed into the Boot Information table is from the start of the disk image (sector with our bootloader). We need to increment that LBA by 1 and place it into the DAP so we are using the LBA where our kernel starts. Using 32-bit instruction we can just read the 32-bit value from the Boot Information Table, add 1 and save it to the DAP. If using strictly 16-bit instructions add one to a 32-bit value is more complex. Since we are going into 386 protected mode we can assume instruction with 32-bit operands are supported in real mode. The code to update the DAP with the LBA of the kernel could look like:
mov ebx, [bi_file] ; Get LBA of our disk image in ISO
inc ebx ; Add sector to get LBA for start of kernel
mov [dap_lba_low], ebx ; Update DAP with LBA of kernel in the ISO
The only other issue is that the bootloader sector needs to be padded out to 2048 (the size of a CD-ROM sector) rather than 512 and we can remove the boot signature. Change:
times 510-($-$$) db 0
db 0x55
db 0xAA
To:
times 2048-($-$$) db 0
The modified bootloader code could look like:
[bits 16]
[org 0x7c00]
KERNEL_OFFSET equ 0x2000
start:
jmp bootld_start
times 8-($-$$) db 0 ; Pad out first 8 bytes
; Boot info table
bi_pvd dd 0
bi_file dd 0
bi_kength dd 0
bi_csum dd 0
bi_reserved times 40 db 0 ; 40 bytes reserved
bootld_start:
xor ax, ax ; Explicitly set ES = DS = 0
mov ds, ax
mov es, ax
mov bx, 0x8C00 ; Set SS:SP to 0x8C00:0x0000 . The stack will exist
; between 0x8C00:0x0000 and 0x8C00:0xFFFF
mov ss, bx
mov sp, ax
mov ebx, [bi_file] ; Get LBA of our disk image in ISO
inc ebx ; Add sector to get LBA for start of kernel
mov [dap_lba_low], ebx ; Update DAP with LBA of kernel in the ISO
mov [BOOT_DRIVE], dl
mov bx, boot_msg
call print_string
mov dl, [BOOT_DRIVE]
call disk_load
jmp pm_setup
jmp $
BOOT_DRIVE:
db 0
disk_load:
mov si, dap
mov ah, 0x42
int 0x13
;cmp al, 4
;jne disk_error_132
ret
dap:
dap_size: db 0x10 ; Size of DAP
dap_zero db 0
; You can only read 11 2048 byte sectors into memory between 0x2000 and
; 0x7C00. Don't read anymore or we overwrite the bootloader we are
; executing from. (0x7c00-0x2000)/2048 = 11 (rounded down)
dap_numsec: dw 11 ; Number of sectors to read
dap_offset: dw KERNEL_OFFSET ; Offset
dap_segment: dw 0 ; Segment
dap_lba_low: dd 0
dap_lba_high:dd 0
disk_error_132:
mov bx, disk_error_132_msg
call print_string
jmp $
disk_error_132_msg:
db 'Error! Error! Something is VERY wrong! (0x132)', 0
gdt_start:
gdt_null:
dd 0x0
dd 0x0
gdt_code:
dw 0xffff
dw 0x0
db 0x0
db 10011010b
db 11001111b
db 0x0
gdt_data:
dw 0xffff
dw 0x0
db 0x0
db 10010010b
db 11001111b
db 0x0
gdt_end:
gdt_descriptor:
dw gdt_end - gdt_start
dd gdt_start
CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start
boot_msg:
db 'OS is booting files... ', 0
done_msg:
db 'Done! ', 0
%include "boot/print_string.asm"
pm_setup:
mov bx, done_msg
call print_string
mov ax, 0
mov ss, ax
mov sp, 0xFFFC
mov ax, 0
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
cli
lgdt[gdt_descriptor]
mov eax, cr0
or eax, 0x1
mov cr0, eax
jmp CODE_SEG:b32
[bits 32]
VIDEO_MEMORY equ 0xb8000
WHITE_ON_BLACK equ 0x0f
print32:
pusha
mov edx, VIDEO_MEMORY
.loop:
mov al, [ebx]
mov ah, WHITE_ON_BLACK
cmp al, 0
je .done
mov [edx], ax
add ebx, 1
add edx, 2
jmp .loop
.done:
popa
ret
b32:
mov ax, DATA_SEG
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
; Place stack below EBDA in lower memory
mov ebp, 0x9c000
mov esp, ebp
mov ebx, pmode_msg
call print32
call KERNEL_OFFSET
jmp $
pmode_msg:
db 'Protected mode enabled!', 0
kernel:
mov ebx, pmode_msg
call print32
jmp $
pmode_tst:
db 'Testing...'
times 2048-($-$$) db 0
You can then modify your original XORRISO command to be:
xorriso -as mkisofs -R -J -c boot/bootcat \
-b boot/boot -no-emul-boot -boot-load-size 4 \
-boot-info-table -o image.iso iso
Coming from Portable Executable and Windows I am trying to create a tiny ELF executable with NASM in Linux (Ubuntu AMD64). My little piece of code looks like this:
BITS 64
; The default virtual address where the executable is linked
_START_VAD equ 0x400000
; Here I write the header byte by byte
ELF_HEADER:
.ident_magic db 0x7f, "ELF"
.ident_class db 2
.ident_data db 1
.ident_version db 1
.ident_osabi db 0
.ident_abiversion db 0
; Pad with zeros until we have 16 bytes for the identifier
times 16-$+ELF_HEADER db 0
.object_type dw 2
.machine dw 0x3e
.version dd 1
.entry_point dq _START_VAD + START
.progr_header dq PROGRAM_HEADER
.sect_header dq SECTIONS_HEADER
.proc_flags dd 0
.header_size dw ELF_HEADER_END - ELF_HEADER
.progr_header_size dw 56
.progr_header_entries dw 3
.sect_header_size dw 64
.sect_header_entries dw 4
.sect_header_index dw 3
ELF_HEADER_END:
PROGRAM_HEADER:
HEADER_SEGMENT_HEADER:
HEADER_SEGMENT_SIZE equ PROGRAM_HEADER_END - PROGRAM_HEADER
.header_segment_type dd 6
.header_segment_flag dd 0x1 | 0x4
.header_segment_offs dq PROGRAM_HEADER
.header_segment_vadr dq _START_VAD + PROGRAM_HEADER
.header_segment_padr dq _START_VAD + PROGRAM_HEADER
.header_segment_file dq HEADER_SEGMENT_SIZE
.header_segment_mems dq HEADER_SEGMENT_SIZE
.header_segment_alig dq 8
INTEPRETER_SEGMENT_HEADER:
INTERP_SEGMENT_SIZE equ INTERPRETER_SEGMENT_END - INTERPRETER_SEGMENT
.interp_segment_type dd 3
.interp_segment_flag dd 0x4
.interp_segment_offs dq INTERPRETER_SEGMENT
.interp_segment_vadr dq _START_VAD + INTERPRETER_SEGMENT
.interp_segment_padr dq _START_VAD + INTERPRETER_SEGMENT
.interp_segment_file dq INTERP_SEGMENT_SIZE
.interp_segment_mems dq INTERP_SEGMENT_SIZE
.interp_segment_alig dq 1
CODE_SEGMENT_HEADER:
CODE_SEGMENT_SIZE equ START_END - ELF_HEADER
.code_segment_type dd 1
.code_segment_flag dd 0x1 | 0x4
.code_segment_offs dq 0
.code_segment_vadr dq _START_VAD
.code_segment_padr dq _START_VAD
.code_segment_file dq CODE_SEGMENT_SIZE
.code_segment_mems dq CODE_SEGMENT_SIZE
.code_segment_alig dq 0x200000
PROGRAM_HEADER_END:
INTERPRETER_SEGMENT:
.intepreter_path db "/lib64/ld-linux-x86-64.so.2", 0
INTERPRETER_SEGMENT_END:
START:
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
mov rsi, _START_VAD + message ; message address
mov rdx, length ; message string length
syscall
mov rax, 60 ; sys_exit
mov rdi, 0 ; return 0 (success)
syscall
message:
db 'Hello, world!',0x0A ; message and newline
length: equ $-message ; message length calculation
START_END:
SECTIONS_STRING_TABLE:
NULL_SECTION_NAME: db 0
STR_TABLE_SECTION_NAME: db ".shstrtab", 0
INTERP_SECTION_NAME: db ".interp", 0
TEXT_SECTION_NAME: db ".text", 0
SECTIONS_STRING_TABLE_END:
SECTIONS_HEADER:
RESERVED_SECTION:
.reserved_sh_name dd 0
.reserved_type dd 0
.reserved_flags dq 0
.reserved_vaddr dq 0
.reserved_offs dq 0
.reserved_size dq 0
.reserved_link dd 0
.reserved_info dd 0
.reserved_alig dq 0
.reserved_ents dq 0
INTERPRETER_SECTION:
.reserved_sh_name dd INTERP_SECTION_NAME - SECTIONS_STRING_TABLE
.reserved_type dd 1
.reserved_flags dq 0x2
.reserved_vaddr dq _START_VAD + INTERPRETER_SEGMENT
.reserved_offs dq INTERPRETER_SEGMENT
.reserved_size dq INTERPRETER_SEGMENT_END - INTERPRETER_SEGMENT
.reserved_link dd 0
.reserved_info dd 0
.reserved_alig dq 1
.reserved_ents dq 0
TEXT_SECTION:
.reserved_sh_name dd TEXT_SECTION_NAME - SECTIONS_STRING_TABLE
.reserved_type dd 1
.reserved_flags dq 0x2 | 0x4
.reserved_vaddr dq _START_VAD + START
.reserved_offs dq START
.reserved_size dq START_END - START
.reserved_link dd 0
.reserved_info dd 0
.reserved_alig dq 16
.reserved_ents dq 0
STRINGTABLE_SECTION:
.reserved_sh_name dd STR_TABLE_SECTION_NAME - SECTIONS_STRING_TABLE
.reserved_type dd 3
.reserved_flags dq 0
.reserved_vaddr dq 0
.reserved_offs dq SECTIONS_STRING_TABLE
.reserved_size dq SECTIONS_STRING_TABLE_END - SECTIONS_STRING_TABLE
.reserved_link dd 0
.reserved_info dd 0
.reserved_alig dq 1
.reserved_ents dq 0
SECTIONS_HEADER_END:
I convert this into an ELF file with NASM and get this info with READELF:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x400104
Start of program headers: 64 (bytes into file)
Start of section headers: 363 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 3
Size of section headers: 64 (bytes)
Number of section headers: 4
Section header string table index: 3
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 00000000004000e8 000000e8
000000000000001c 0000000000000000 A 0 0 1
[ 2] .text PROGBITS 0000000000400104 00000104
000000000000004e 0000000000000000 AX 0 0 16
[ 3] .shstrtab STRTAB 0000000000000000 00000152
0000000000000019 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000000a8 0x00000000000000a8 R E 8
INTERP 0x00000000000000e8 0x00000000004000e8 0x00000000004000e8
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000152 0x0000000000000152 R E 200000
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .text
There is no dynamic section in this file.
There are no relocations in this file.
There are no unwind sections in this file.
No version information found in this file.
of which I am not sure if is correct, specially the LOAD segment and its size.
When I run this I get Segmentation Fault (core dumped). A debugger will tell me more:
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7de5be2 in ?? () from /lib64/ld-linux-x86-64.so.2
(gdb)
I don't know what segments or sections are missing or what offsets/addresses are wrong. Maybe someone with more experience in ELF and Linux loaders will give me a little push because I am really stuck.
I don't know what segments or sections are missing or what offsets/addresses are wrong.
Your binary has PT_INTERP segment, which makes the Linux kernel think it's a dynamically linked binary. But it has no PT_DYNAMIC segment, which ld-linux expects.
Removing all references to INTERPRETER_{SEGMENT,SECTION} and adjusting number of program headers and sections turns this into a fully-static binary, that works:
nasm t.s
./t
Hello, world!
readelf -l t
Elf file type is EXEC (Executable file)
Entry point 0x4000b0
There are 2 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x0000000000000070 0x0000000000000070 R E 8
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000fe 0x00000000000000fe R E 200000
Section to Segment mapping:
Segment Sections...
00
01 .text
I mean could a single binary file run in both Win32 and Linux i386 ?
This is not possible, because the two types have conflicting formats:
The initial two characters of a PE file must be 'M' 'Z';
The initial four characters of an ELF file must be '\x7f' 'E' 'L' 'F'.
Clearly, you can't create one file that satisifies both formats.
In response to the comment about a polyglot binary valid as both a 16 bit COM file and a Linux ELF file, that's possible (although really a COM file is a DOS program, not Windows - and certainly not Win32).
Here's one I knocked together - compile it with NASM. It works because the first two bytes of an ELF file ('\x7f' 'E') happen to also be valid 8086 machine code (a 45 byte relative jump-if-greater-than instruction). Minimal ELF headers cribbed from Brian Raiter.
BITS 32
ORG 0x08048000
ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
times 0x47-($-$$) db 0
; DOS COM File code
BITS 16
mov dx, msg1 - $$ + 0x100
mov ah, 0x09
int 0x21
mov ah, 0x00
int 0x21
msg1: db `Hello World (DOS).\r\n$`
BITS 32
phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr
; Linux ELF code
_start:
mov eax, 4 ; SYS_write
mov ebx, 1 ; stdout
mov ecx, msg2
mov edx, msg2_len
int 0x80
mov eax, 1 ; SYS_exit
mov ebx, 0
int 0x80
msg2: db `Hello World (Linux).\n`
msg2_len equ $ - msg2
filesize equ $ - $$
The two formats are sufficiently different that a hybrid is unlikely.
However, Linux supports loading different executable formats by "interpreter". This way compiled .exe files containing CIL (compiled C# or other .NET languages) can be executed directly under Linux, for example.
Sure. Use Java.