Creating a bootable ISO image with custom bootloader

Creating a bootable ISO image with custom bootloader - linux

I am trying to convert a bootloader I wrote in Assembly Language to an ISO image file. The following is the code from MikeOS bootloader. Here is my bootloader code:
BITS 16
start:
mov ax, 07C0h ; Set up 4K stack space after this bootloader
add ax, 288 ; (4096 + 512) / 16 bytes per paragraph
mov ss, ax
mov sp, 4096
mov ax, 07C0h ; Set data segment to where we're loaded
mov ds, ax
mov si, text_string ; Put string position into SI
call print_string ; Call our string-printing routine
jmp $ ; Jump here - infinite loop!
text_string db 'This is my cool new OS!', 0
print_string: ; Routine: output string in SI to screen
mov ah, 0Eh ; int 10h 'print char' function
.repeat:
lodsb ; Get character from string
cmp al, 0
je .done ; If char is zero, end of string
int 10h ; Otherwise, print it
jmp .repeat
.done:
ret
times 510-($-$$) db 0 ; Pad remainder of boot sector with 0s
dw 0xAA55 ; The standard PC boot signature
I typed the following command:
nasm -f bin -o boot.bin boot.asm
This command works fine and it gives a .bin output. Next I typed the following command:
dd if=boot.bin of=floppy.img count=1 bs=512
This also worked fine and gave me the .img output file. When I type this command:
dd if=boot.bin of=floppy.img skip seek=1 count=1339
I get the following error: dd: unrecognized operand ‘skip’. I read in the DD documentation that the skip attribute must have a number assigned to it. Any ideas what number should I type with skip attribute (Ex. skip=1).
Next I type the following command:
mkdosfs -C floppy.img 1440
I get the following error: mkdosfs: unable to create floppy.img. How do I fix the problems I am encountering? Is there another easier way I could convert my bootloader .bin file to an ISO image?

It appears you found your example for creating a bootable ISO image from this StackOverflow Answer. Unfortunately you picked an accepted answer that is incorrect in many ways. Pretend you never saw that answer.
On most Linux distros either a program called genisoimage or mkisofs exists. These days they are actually the same program. Whichever you have can be substituted in the examples below. My examples will assume the ISO creation utility is called genisoimage.
In your question you have some bootloader code in a file called boot.asm. You correctly assemble this to a boot sector binary image with:
nasm -f bin -o boot.bin boot.asm
This creates boot.bin which is your boot sector. The next step is to create a floppy disk image and place boot.bin in the first sector. You can do that with this:
dd if=/dev/zero of=floppy.img bs=1024 count=1440
dd if=boot.bin of=floppy.img seek=0 count=1 conv=notrunc
The first command simply makes a zero filled disk image equal to the size of a 1.44MB floppy (1024*1440 bytes). The second command places boot.bin into the first sector of floppy.img without truncating the rest of the file. seek=0 says seek to first sector (512 bytes is default size of a block for DD). count=1 specifies we only want to copy 1 sector (512 bytes) from boot.bin. conv=notrunc says that after writing to the output file, that the remaining disk image is to remain intact (not truncated).
After building a disk image as shown above, you can create an ISO image with these commands:
mkdir iso
cp floppy.img iso/
genisoimage -quiet -V 'MYOS' -input-charset iso8859-1 -o myos.iso -b floppy.img \
-hide floppy.img iso/
The commands above first create a sub-directory called iso that will contain the files to be placed onto the final CD-ROM image. The second command doesn't do anything more than copy our floppy.img into iso directory because we need that for booting. The third command does the heavy lifting and builds the ISO image.
-V 'MYOS' sets the volume label (It can be whatever you want)
-input-charset iso8859-1 sets the character set being used. Don't change it
-o myos.iso says the ISO image will be output to the file myos.iso
-b floppy.img says that our ISO will be bootable and the boot image being used is the file floppy.img
-hide floppy.img isn't needed but it hides the boot image from the final ISO's directory listing. If you were to mount this ISO and do an ls on it to list the files, floppy.img wouldn't appear.
iso/ on the end of the command is the directory that will be used to build the ISO image from. It needs to at least contain our bootable floppy image, but you can place any other files you wish into the iso/ directory.
The ISO image myos.iso that is generated can be booted. An example of using QEMU to launch such an image:
qemu-system-i386 -cdrom ./myos.iso

For CD; there's a specification ("El Torito") that describes how bootable CDs work; where the first 16 (2048-byte) sectors are unused, there's a "boot catalogue" that the firmware uses to decide which boot loader it should use (so you can have a single CD that boots very different systems - e.g. PC BIOS, UEFI, PowerPC, etc), then the boot loaders themselves.
For "PC BIOS" alone, there's 3 possibilities:
emulate a floppy disk (using a "floppy disk image" stored on CD)
emulate a hard disk (using a "hard disk image" stored on CD)
no emulation
The first 2 options are mostly for compatibly purposes only (crusty old OSs that don't support booting from CDs, like MS-DOS); and have performance implications (e.g. to emulate loading one 512-byte virtual sector, the firmware has to load a real 2048-byte sector and throw away the excess 1536 bytes). Any OS designed/written in the last 15+ years should be using "no emulation".
For "no emulation":
The firmware loads your entire boot loader (which can be up to about 512 KiB) and not just one sector
Sectors on CDs are 2048 bytes (and not 512 bytes); and should be loaded via. "int 0x13 extensions" (and not the old/limited "CHS disk functions" that you'd use for floppy)
There is no need for a BIOS Parameter Block (which should be considered mandatory for floppy disks)
There is no need for a partition table (which should be considered mandatory for hard disks, including GPT)
You'll probably want to support ISO9660 as the file system (to find the kernel and/or other files that the boot loader needs to load) and not FAT.
Also note that (in general) for "PC BIOS" you're probably going to want 5 different boot loaders (one for floppy, one for "MBR partitioned" hard disk, one for "GPT partitioned" hard disk, one for CD, and one for network boot). These cases are all different enough (and the "one 512-byte initial sector only" limitation for 3 of these cases is restrictive enough) to make the "all devices supported by one boot loader" idea a disaster.
To actually generate the ISO; you can use an existing tool (e.g. mkisofs), or you can write your own tool (ISO9660 and "El Torito" are both relatively easy to understand, and writing your own tool to generate an ISO can be done in less than 2 days, which is like a drop in the ocean for OS development projects).

Related

How to turn hex code into x86 instructions

I'm trying to make a script or program that will take given bytes (given in hexadecimal), and convert them into a x86 instructions (For example c3 -> retq)
I've tried doing it by calling gcc -c on an assembly file just containing
retq
retq
and then using a script to insert bytes where it says "c3 c3", then using objdump -d to see what it says now. But it seems that it messes up the format of the file unless I only pass an instruction of the same size as the original instruction bytes.
I'm running it on a Raspbian Pi (A linux based operating system) using SSH, BASH terminal. I'm using BASH shell scripts and python, as well as the tools listed here, and gdb.

Disassemble flat binary file: objdump -D -b binary -m i386 foo.bin. Or create an object file using .byte directives from assembly source, e.g. put .byte 0xc3 into foo.s then gcc -c foo.s then objdump -d foo.o

How to find the PHDR of dynamically linked/loaded libraries from a kernel module?

I need to access the program header tables (or alternatively to the section headers) of a process from the kernel in order to find the addresses of .eh_frame and .eh_frame_hdr sections from a linux kernel module. In userspace I would use dl_iterate_phdr(), but I need a kernel-space solution. If possible, it would not need to go through the elf files.
The auxiliary vector has the AT_PHDR field, but it does not help to find the PHDRs of dynamically linked/loaded libraries.
My other idea was to iterate on the vm_areas to find the PHDR address from every file that has an executable mmap in the task's memory. The problem with this solution is that the elf file can be changed or deleted after load.
Is there a way to do this that relies only on memory and not on the elf file?

It looks like the Elf header (which has the file offset to the phdr table - often the same as the offset in memory) is always at the beginning of executable mmaps. It does not seem really reliable as I could not find any documentation about the appearance of the Ehdr but it seems present in practice. This could be because of the fact that it must be at the beginning of Elf files and that the page size and alignment makes the executable segment start at offset 0x0.
We can verify that executable mappings start at offset 0x0 for all running processes and loaded shared object with this bash line:
sudo cat /proc/*/maps | awk '{ print $2 " " $3 " " $6;}' | egrep '^..x.' | grep -vE '.... 0{8}'
It outputs all the executable mappings that do not start at offset 0x0, so no output means that the Ehdrs are at the beginning of executable vm_areas.

Is it possible to assemble and run raw CPU instructions using `as`?

There are a couple of related questions here.
Consider a program consisting only of the following two instructions
movq 1, %rax
cpuid
If I throw this into a file called Foo.asm, and run as Foo.asm, where as is the portable GNU assembler, I will get a file called a.out, of size 665 bytes on my system.
If I then chmod 700 a.out and try ./a.out, I will get an error saying cannot execute binary file.
Why is the file so large, if I am merely trying to translate two asm instructions into binary?
Why can the binary not be executed? I am providing valid instructions, so I would expect the CPU to be able to execute them.
How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
Once I have the answer to 3, how can I get my processor to execute them? (Assuming that I am not running privileged instructions.)

Why is the file so large, if I am merely trying to translate two asm instructions into binary?
Because the assembler creates a relocatable object file which includes additional information, like memory Sections and Symbol tables.
Why can the binary not be executed?
Because it is an (relocatable) object file, not a loadable file. You need to link it in order to make it executable so that it can be loaded by the operating system:
$ ld -o Foo a.out
You also need to give the linker a hint about where your program starts, by specifying the _start symbol.
But then, still, the Foo executable is larger than you might expect since it still contains additional information (e.g. the elf header) required by the operating system to actually launch the program.
Also, if you launch the executable now, it will result in a segmentation fault, since you are loading the contents of address 1, which is not mapped into your address space, into rax. Still, if you fix this, the program will run into undefined code at the end - you need to make sure to gracefully exit the program through a syscall.
A minimal running example (assumed x86_64 architecture) would look like
.globl _start
_start:
movq $1, %rax
cpuid
mov $60, %rax # System-call "sys_exit"
mov $0, %rdi # exit code 0
syscall
How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
You can use objcopy to generate a raw binary image from an object file:
$ objcopy -O binary a.out Foo.bin
Then, Foo.bin will only contain the instruction opcodes.
nasm has a -f bin option which creates a binary-only representation of your assembly code. I used this to implement a bare boot loader for VirtualBox (warning: undocumented, protoype only!) to directly launch binary code inside a VirtualBox image without operating system.
Once I have the answer to 3, how can I get my processor to execute them?
You will not be able to directly execute the raw binary file under Linux. You will need to write your own loader for that or not use an operating system at all. For an example, see my bare boot loader link above - this writes the opcodes into the boot loader of a VirtualBox disc image, so that the instructions are getting executed when launching the VirtualBox machine.

The old MS-DOS COM file format does not include a header. It really only contains the binary executable code. The code size can, however, not exceed 64kb. I don't know whether Linux can execute these.

You can write the opcodes into a file using a hexeditor. Then you just need to surround it with an elf header that Linux knows how to execute it.
Here's an example:
hexedit myfile.bin
Now just write your opcodes inside the file using the hexeditor.
After that you need to add the elf header. You could do this by hand and write the elf header into your .bin file, but that a bit tricky. Easiest method is to use a few commands (In this example for 64 bit).
objcopy --input-target=binary --output-target=elf64-x86-64 myfile.bin myfile.o
ld -o myfile myfile.o -T binary.ld
You will also need a linker script. I called this for example binary.ld.
And that are the contents of binary.ld:
ENTRY(_start);
SECTIONS
{
_start = 0x0;
}
Now you can execute your program: ./myfile

Perhaps there's something like exe2bin utility for the GNU tool set. I've used various versions of exe2bin with Microsoft tools, and the ARM toolkit has the ability to produce binaries, but I don't recall if it was directly from the linked output or something like exe2bin.

How to convert PE(Portable Executable) format to ELF in linux

What's the best tool for converting PE binaries to ELF binaries?
Following is a brief motivation for this question:
Suppose I have a simple C program.
I compiled it using gcc for linux(this gives ELF), and using 'i586-mingw32msvc-gcc' for Windows(this gives a PE binary).
I want to analyze these two binaries for similarities, using Bitblaze's static analysis tool - vine(http://bitblaze.cs.berkeley.edu/vine.html)
Now vine doesn't have a good support for PE binaries, so I wanted to convert PE->ELF, and then carry on with my comparison/analysis.
Since all the analysis has to run on Linux, I would prefer a utility/tool that runs on Linux.
Thanks

It is possible to rebuild an EXE as an ELF binary, but the resulting binary will segfault very soon after loading, due to the missing operating system.
Here's one method of doing it.
Summary
Dump the section headers of the EXE file.
Extract the raw section data from the EXE.
Encapsulate the raw section data in GNU linker script snippets.
Write a linker script to build an ELF binary, including those scripts from the previous step.
Run ld with the linker script to produce the ELF file.
Run the new program, and watch it segfault as it's not running on Windows (and it tries to call functions in the Import Address Table, which doesn't exist).
Detailed Example
Dump the section headers of the EXE file. I'm using objdump from the mingw cross compiler package to do this.
$ i686-pc-mingw32-objdump -h trek.exe
trek.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 AUTO 00172600 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00001400 00574000 00574000 00172a00 2**2
CONTENTS, ALLOC, LOAD, DATA
2 DGROUP 0002b600 00576000 00576000 00173e00 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 000e7800 005a2000 005a2000 00000000 2**2
ALLOC
4 .reloc 00013000 0068a000 0068a000 0019f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .rsrc 00000a00 0069d000 0069d000 001b2400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Use dd (or a hex editor) to extract the raw section data from the EXE. Here, I'm just going to copy the code and data sections (named AUTO and DGROUP in this example). You may want to copy additional sections though.
$ dd bs=512 skip=2 count=2963 if=trek.exe of=code.bin
$ dd bs=512 skip=2975 count=347 if=trek.exe of=data.bin
Note, I've converted the file offsets and section sizes from hex to decimal to use as skip and count, but I'm using a block size of 512 bytes in dd to speed up the process (example: 0x0400 = 1024 bytes = 2 blocks # 512 bytes).
Encapsulate the raw section data in GNU ld linker scripts snippets (using the BYTE directive). This will be used to populate the sections.
cat code.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >code.ld
cat data.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >data.ld
Write a linker script to build an ELF binary, including those scripts from the previous step. Note I've also set aside space for the uninitialized data (.bss) section.
start = 0x516DE8;
ENTRY(start)
OUTPUT_FORMAT("elf32-i386")
SECTIONS {
.text 0x401000 :
{
INCLUDE "code.ld";
}
.data 0x576000 :
{
INCLUDE "data.ld";
}
.bss 0x5A2000 :
{
. = . + 0x0E7800;
}
}
Run the linker script with GNU ld to produce the ELF file. Note I have to use an emulation mode elf_i386 since I'm using 64-bit Linux, otherwise a 64-bit ELF would be produced.
$ ld -o elf_trek -m elf_i386 elf_trek.ld
ld: warning: elf_trek.ld contains output sections; did you forget -T?
$ file elf_trek
elf_trek: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, not stripped
Run the new program, and watch it segfault as it's not running on Windows.
$ gdb elf_trek
(gdb) run
Starting program: /home/quasar/src/games/botf/elf_trek
Program received signal SIGSEGV, Segmentation fault.
0x0051d8e6 in ?? ()
(gdb) bt
\#0 0x0051d8e6 in ?? ()
\#1 0x00000000 in ?? ()
(gdb) x/i $eip
=> 0x51d8e6: sub (%edx),%eax
(gdb) quit
IDA Pro output for that location:
0051D8DB ; size_t stackavail(void)
0051D8DB proc stackavail near
0051D8DB push edx
0051D8DC call [ds:off_5A0588]
0051D8E2 mov edx, eax
0051D8E4 mov eax, esp
0051D8E6 sub eax, [edx]
0051D8E8 pop edx
0051D8E9 retn
0051D8E9 endp stackavail
For porting binaries to Linux, this is kind of pointless, given the Wine project.
For situations like the OP's, it may be appropriate.

I've found a simpler way to do this. Use the strip command.
Example
strip -O elf32-i386 -o myprogram.elf myprogram.exe
The -O elf32-i386 has it write out the file in that format.
To see supported formats run
strip --info
I am using the strip command from mxe, which on my system is actually named /opt/mxe/usr/bin/i686-w64-mingw32.static-strip.

I don't know whether this totally fits your needs, but is it an option for you to cross-compile with your MinGW version of gcc?
I mean do say: does it suit your needs to have i586-mingw32msvc-gcc compile direct to ELF format binaries (instead of the PEs you're currently getting). A description of how to do things in the other direction can be found here - I imagine it will be a little hacky but entirely possible to make this work for you in the other direction (I must admit I haven't tried it).

Executing a flat binary file under Linux

Is there a way to execute a flat binary image in Linux, using a syntax something like:
nasm -f bin -o foo.bin foo.asm
runbinary foo.bin

The Linux kernel can load several different binary formats - ELF is just the most common, though the a.out format is also pretty well known.
The supported binary formats are controlled by which binfmt modules are loaded or compiled in to the kernel (they're under the Filesystem section of the kernel config). There's a binfmt_flat for uClinux BFLT flat format binaries which are pretty minimal - they can even be zlib compressed which will let you make your binary even smaller, so this could be a good choice.
It doesn't look like nasm natively supports this format, but it's pretty easy to add the necessary header manually as Jim Lewis describes for ELF. There's a description of the format here.

Is there some reason you don't want to use "-f elf" instead of "-f bin"?
I think Linux won't run a binary that's not in ELF format. I can't find a tool that converts flat binaries to ELF, but you can cheat by putting the ELF information in foo.asm,
using the technique described here :
We can look at the ELF
specification, and
/usr/include/linux/elf.h, and
executables created by the standard
tools, to figure out what our empty
ELF executable should look like. But,
if you're the impatient type, you can
just use the one I've supplied here:
BITS 32
org 0x08048000
ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
; your program here
filesize equ $ - $$
This image contains an ELF header,
identifying the file as an Intel 386
executable, with no section header
table and a program header table
containing one entry. Said entry
instructs the program loader to load
the entire file into memory (it's
normal behavior for a program to
include its ELF header and program
header table in its memory image)
starting at memory address 0x08048000
(which is the default address for
executables to load), and to begin
executing the code at _start, which
appears immediately after the program
header table. No .data segment, no
.bss segment, no commentary — nothing
but the bare necessities.
So, let's add in our little program:
; tiny.asm
org 0x08048000
;
; (as above)
;
_start: mov bl, 42 xor eax, eax inc eax int 0x80 filesize equ $ - $$
and try it out:
$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42

Minimally, Linux will need to figure out the format of the executable and it will get that from the first bytes. For example, if it's a script that will be #!, shebang. If it's ELF that will be 0x7F 'E' 'L' 'F'. Those magic numbers will determine the handler from a lookup.
So you're gonna need a header with a recognized magic number. You can get a list of shebang supported formats in /proc/sys/fs/binfmt_misc. Getting a list of native binary formats is (unfortunately) a little trickier.
bFLT may be a good choice. Indeed, it's a popular embedded executable format. But you can also squeeze ELF down quite far. This article got an ELF executable down to 45 bytes. That said, you'd be squeezing it down mostly by hand rather than by tool.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string