Minimal assembly program ARM - linux

I am learning assembly, but I'm having trouble understanding how a program is executed by a CPU when using gnu/linux on arm. I will elaborate.
Problem:
I want my program to return 5 as it's exit status.
Assembly for this is:
.text
.align 2
.global main
.type main, %function
main:
mov w0, 5 //move 5 to register w0
ret //return
I then assemble it with:
as prog.s -o prog.o
Everything ok up to here. I understand I then have to link my object file to the C library in order to add additional code that will make my program run. I then link with(paths are omitted for clarity):
ld crti.o crtn.o crt1.o libc.so prog.o ld-linux-aarch64.so.1 -o prog
After this, things work as expected:
./prog; echo $?
5
My problem is that I can't figure out what the C standard library is actually doing here. I more or less understand crti/n/1 are adding entry code to my program (eg the .init and .start sections), but no clue what's libc purpose.
I am interested at what would be a minimal assembly implementation of "returning 5 as exit status"
Most resources on the web focus on the instructions and program flow once you are in main. I am really interested at what are all the steps that go on once I execute with ./. I am now going through computer architecture textbooks, but I hope I can get a little help here.

The C language starts at main() but for C to work you in general need at least a minimal bootstrap. For example before main can be called from the C bootstrap
1) stack/stackpointer
2) .data initialized
3) .bss initalized
4) argc/argv prepared
And then there is C library which there are many/countless C libraries and each have their own designs and requirements to be satisfied before main is called. The C library makes system calls into the system so this starts to become system (operating system, Linux, Windows, etc) dependent, depending on the design of the C library that may be a thin shim or heavily integrated or somewhere in between.
Likewise for example assuming that the operating system is taking the "binary" (binary formats supported by the operating system and rules for that format are defined by the operating system and the toolchain must conform likewise the C library (even though you see the same brand name sometimes assume toolchain and C library are separate entities, one designed to work with the other)) from a non volatile media like a hard drive or ssd and copying the relevant parts into memory (some percentage of the popular, supported, binary file formats, are there for debug or file format and not actually code or data that is used for execution).
So this leaves a system level design option of does the binary file format indicate .data, .bss, .text, etc (note that .data, .bss, .text are not standards just convention most people know what that means even if a particular toolchain did not choose to use those names for sections or even the term sections).
If so the operating systems loader that takes the program and loads it into memory can choose to put .data in the right place and zero .bss for you so that the bootstrap does not have to. In a bare-metal situation the bootstrap would normally handle the read/write items because it is not loaded from media by some other software it is often simply in the address space of the processor on a rom of some flavor.
Likewise argv/argc could be handled by the operating systems tool that loads the binary as it had to parse out the location of the binary from the command line assuming the operating system has/uses a command line interface. But it could as easily simply pass the command line to the bootstrap and the bootstrap has to do it, these are system level design choices that have little to do with C but everything to do with what happens before main is called.
The memory space rules are defined by the operating system and between the operating system and the C library which often contains the bootstrap due to its intimate nature but I guess the C library and bootstrap could be separate. So linking plays a role as well, does this operating system support protection is it just read/write memory and you just need to spam it in there or are there separate regions for read/only (.text, .rodata, etc) and read/write (.data, .bss, etc). Someone needs to handle that, linker script and bootstrap often have a very intimate relationship and the linker script solution is specific to a toolchain not assumed to be portable, why would it, so while there are other solutions the common solution is that there is a C library with a bootstrap and linker solution that are heavily tied to the operating system and target processor.
And then you can talk about what happens after main(). I am happy to see you are using ARM not x86 to learn first, although aarch64 is a nightmare for a first one, not the instruction set just the execution levels and all the protections, you can go a long long way with this approach but there are some things and some instructions you cannot touch without going bare metal. (assuming you are using a pi there is a very good bare-metal forum with a lot of good resources).
The gnu tools are such that binutils and gcc are separate but intimately related projects. gcc knows where things are relative to itself so assuming you combined gcc with binutils and glibc (or you just use the toolchain you found), gcc knows relative to where it executed to find these other items and what items to pass when it calls the linker (gcc is to some extent just a shell that calls a preprocessor a compiler the assembler then linker if not instructed not to do these things). But the gnu binutils linker does not. While as distasteful as it feels to use, it is easier just to
gcc test.o -o test
rather than figure out for that machine that day what all you need on the ld command line and what paths and depending on design the order on the command line of the arguments.
Note you can probably get away with this as a minimum
.global main
.type main, %function
main:
mov w0, 5 //move 5 to register w0
ret //return
or see what gcc generates
unsigned int fun ( void )
{
return 5;
}
.arch armv8-a
.file "so.c"
.text
.align 2
.p2align 4,,11
.global fun
.type fun, %function
fun:
mov w0, 5
ret
.size fun, .-fun
.ident "GCC: (GNU) 10.2.0"
I am used to seeing more fluff in there:
.arch armv5t
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global fun
.syntax unified
.arm
.type fun, %function
fun:
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #5
bx lr
.size fun, .-fun
.ident "GCC: (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
.section .note.GNU-stack,"",%progbits
Either way, you can look up each of the assembly language items and decide if you really need them or not, depends in part on if you feel the need to use a debugger or binutils tools to tear apart the binary (do you really need to know the size of fun for example in order to learn assembly language?)
If you wish to control all of the code and not link with a C library you are more than welcome to you need to know the memory space rules for the operating system and create a linker script (the default one may in part be tied to the C library and is no doubt overly complicated and not something you would want to use as a starting point). In this case being two instructions in main you simply need the one address space valid for the binary, however the operating system enters (ideally using the ENTRY(label), which could be main if you want but often is not _start is often found in linker scripts but is not a rule either, you choose. And as pointed out in comments you would need to make the system call to exit the program. System calls are specific to the operating system and possibly version and not specific to a target (ARM), so you would need to use the right one in the right way, very doable, your whole project linker script and assembly language could be maybe a couple dozen lines of code total. We are not here to google those for you so you would be on your own for that.
Part of your problem here is you are searching for compiler solutions when the compiler in general has absolutely nothing to do with any of this. A compiler takes one language turns it into another language. An assembler same deal but one is simple and the other usually machine code, bits. (some compilers output bits not text as well). It is equivalent to looking up the users manual for a table saw to figure out how to build a house. The table saw is just a tool, one of the tools you need, but just a generic tool. The compiler, specific gnu's gcc, is generic it does not even know what main() is. Gnu follows the Unix way so it has a separate binutils and C library, separate developments, and you do not have to combine them if you do not want to, you can use them separately. And then there is the operating system so half your question is buried in operating system details, the other half in a particular C library or other solution to connect main() to the operating system.
Being open source you can go look at the bootstrap for glibc and others and see what they do. Understanding this type of open source project the code is nearly unreadable, much easier to disassemble sometimes, YMMV.
You can search for the Linux system calls for arm aarch64 and find the one for exit, you can likely see that the open source C libraries or bootstrap solutions you find that are buried under what you are using today, will call exit but if not then there is some other call they need to make to return back to the operating system. It is unlikely it is a simple ret with a register that holds a return value, but technically that is how someone could choose to do it for their operating system.
I think you will find for Linux on arm that Linux is going to parse the command line and pass argc/argv in registers, so you can simply use them. And likely going to prep .data and .bss so long as you build the binary correctly (you link it correctly).

Here's a bare minimum example.
Run it with:
gcc -c thisfile.S && ld thisfile.o && ./a.out
Source code:
#include <sys/syscall.h>
.global _start
_start:
movq $SYS_write, %rax
movq $1, %rdi
movq $st, %rsi
movq $(ed - st), %rdx
syscall
movq $SYS_exit, %rax
movq $1, %rdi
syscall
st:
.ascii "\033[01;31mHello, OS World\033[0m\n"
ed:

Related

nasm system calls Linux

I have got a question about linux x86 system calls in assembly.
When I am creating a new assembly program with nasm on linux, I'd like to know which system calls I have to use for doing a specific task (for example reading a file, writing output, or simple exiting...). I know some syscall because I've read them on some examples taken around internet (such as eax=0, ebx=1 int 0x80 exit with return value of 1), but nothing more... How could I know if there are other arguments for exit syscall? Or for another syscall? I'm looking for a docs that explain which syscalls have which arguments to pass in which registers.
I've read the man page about exit function etc. but it didn't explain to me what I'm asking.
Hope I was clear enough,
Thank you!
The x86 wiki (which I just updated again :) has links to the system call ABI (what the numbers are for every call, where to put the params, what instruction to run, and which registers will clobbered on return). This is not documented in the man page because it's architecture-specific. Same for binary constants: they don't have to be the same on every architecture.
grep -r O_APPEND /usr/include for your target architecture to recursively search the .h files.
Even better is to set things up so you can use the symbolic constants in your asm source, for readability and to avoid the risk of errors.
The gcc actually does use the C Preprocessor when processing .S files, but including most C header files will also get you some C prototypes.
Or convert the #defines to NASM macros with sed or something. Maybe feed some #include<> lines to the C preprocessor and have it print out just the macro definitions.
printf '#include <%s>\n' unistd.h sys/stat.h |
gcc -dD -E - |
sed -ne 's/^#define \([A-Za-z_0-9]*\) \(.\)/\1\tequ \2/p'
That turns every non-empty #define into a NASM symbol equ value. The resulting file has many lines of error: expression syntax error when I tried to run NASM on it, but manually selecting some valid lines from that may work.
Some constants are defined in multiple steps, e.g. #define S_IRGRP (S_IRUSR >> 3). This might or might not work when converted to NASM equ symbol definitions.
Also note that in C 0666, is an octal constant. In NASM, you need either 0o666 or 666o; a leading 0 is not special. Otherwise, NASM syntax for hex and decimal constants is compatible with C.
Perhaps you are looking for something like linux/syscalls.h[1], which you have on your system if you've installed the Linux source code via apt-get or whatever your distro uses.
[1] http://lxr.free-electrons.com/source/include/linux/syscalls.h#L326

Linux exit function

I am trying to understand linux syscalls mechanism. I am reading a book and it in the book it says that exit function look like that(with gdb):
mov $0x0,%ebx
mov $0x1,%eax
80 int $0x80
I understand that this is a syscall to exit, but in my Debian it looks like that:
jmp *0x8049698
push $0x8
jmp 0x80482c0
maybe can someone explain me why it's not the same? When I try to do disas on 0x80482c0
gdb prints me:
No function contains specified address.
Also, can someone give me a good reference to Linux Internals material(as Windows internals)?
Thanks!
The function you most likely called is exit() from C Standard Library (see man 3 exit). This function is a library function which, in turn, calls SYS_exit system call, but not being a system call itself. You will not see that good looking int 0x80 code in your C program disassembly. All existing functions (exit(), syscall(), etc.) are called from some library, so your program is only doing call to that library, and those functions are not belong to your program.
If you want to see exactly that int 0x80 code -- you can inline that asm code in your C application. But this is considered a bad practice, though, as your code become architecture-dependent (only applicable to x86 architecture, in your case).
can someone give me a good reference to Linux Internals material
The code itself is the best up-to-date reference. All books are more or less outdated. Also look into Documentation/ directory in kernel sources.

How can I compile C code to get a bare-metal skeleton of a minimal RISC-V assembly program?

I have the following simple C code:
void main(){
int A = 333;
int B=244;
int sum;
sum = A + B;
}
When I compile this with
$riscv64-unknown-elf-gcc code.c -o code.o
If I want to see the assembly code I use
$riscv64-unknown-elf-objdump -d code.o
But when I explore the assembly code I see that this generates a lot of code which I assume is for Proxy Kernel support (I am a newbie to riscv). However, I do not want that this code has support for Proxy kernel, because the idea is to implement only this simple C code within an FPGA.
I read that riscv provides three types of compilation: Bare-metal mode, newlib proxy kernel and riscv Linux. According to previous research, the kind of compilation that I should do is bare metal mode. This is because I desire a minimum assembly without support for the operating system or kernel proxy. Assembly functions as a system call are not required.
However, I have not yet been able to find as I can compile a C code for get a skeleton of a minimal riscv assembly program. How can I compile the C code above in bare metal mode or for get a skeleton of a minimal riscv assembly code?
Warning: this answer is somewhat out-of-date as of the latest RISC-V Privileged Spec v1.9, which includes the removal of the tohost Control/Status Register (CSR), which was a part of the non-standard Host-Target Interface (HTIF) which has since been removed. The current (as of 2016 Sep) riscv-tests instead perform a memory-mapped store to a tohost memory location, which in a tethered environment is monitored by the front-end server.
If you really and truly need/want to run RISC-V code bare-metal, then here are the instructions to do so. You lose a bunch of useful stuff, like printf or FP-trap software emulation, which the riscv-pk (proxy kernel) provides.
First things first - Spike boots up at 0x200. As Spike is the golden ISA simulator model, your core should also boot up at 0x200.
(cough, as of 2015 Jul 13, the "master" branch of riscv-tools (https://github.com/riscv/riscv-tools) is using an older pre-v1.7 Privileged ISA, and thus starts at 0x2000. This post will assume you are using v1.7+, which may require using the "new_privileged_isa" branch of riscv-tools).
So when you disassemble your bare-metal program, it better
start at 0x200!!! If you want to run it on top of the proxy-kernel, it
better start at 0x10000 (and if Linux, it’s something even larger…).
Now, if you want to run bare metal, you’re forcing yourself to write up the
processor boot code. Yuck. But let’s punt on that and pretend that’s not
necessary.
(You can also look into riscv-tests/env/p, for the “virtual machine”
description for a physically addressed machine. You’ll find the linker script
you need and some macros.h to describe some initial setup code. Or better
yet, in riscv-tests/benchmarks/common.crt.S).
Anyways, armed with the above (confusing) knowledge, let’s throw that all
away and start from scratch ourselves...
hello.s:
.align 6
.globl _start
_start:
# screw boot code, we're going minimalist
# mtohost is the CSR in machine mode
csrw mtohost, 1;
1:
j 1b
and link.ld:
OUTPUT_ARCH( "riscv" )
ENTRY( _start )
SECTIONS
{
/* text: test code section */
. = 0x200;
.text :
{
*(.text)
}
/* data: Initialized data segment */
.data :
{
*(.data)
}
/* End of uninitalized data segement */
_end = .;
}
Now to compile this…
riscv64-unknown-elf-gcc -nostdlib -nostartfiles -Tlink.ld -o hello hello.s
This compiles to (riscv64-unknown-elf-objdump -d hello):
hello: file format elf64-littleriscv
Disassembly of section .text:
0000000000000200 <_start>:
200: 7810d073 csrwi tohost,1
204: 0000006f j 204 <_start+0x4>
And to run it:
spike hello
It’s a thing of beauty.
The link script places our code at 0x200. Spike will start at
0x200, and then write a #1 to the control/status register
“tohost”, which tells Spike “stop running”. And then we spin on an address
(1: j 1b) until the front-end server has gotten the message and kills us.
It may be possible to ditch the linker script if you can figure out how to
tell the compiler to move <_start> to 0x200 on its own.
For other examples, you can peruse the following repositories:
The riscv-tests repository holds the RISC-V ISA tests that are very minimal
(https://github.com/riscv/riscv-tests).
This Makefile has the compiler options:
https://github.com/riscv/riscv-tests/blob/master/isa/Makefile
And many of the “virtual machine” description macros and linker scripts can
be found in riscv-tests/env (https://github.com/riscv/riscv-test-env).
You can take a look at the “simplest” test at (riscv-tests/isa/rv64ui-p-simple.dump).
And you can check out riscv-tests/benchmarks/common for start-up and support code for running bare-metal.
The "extra" code is put there by gcc and is the sort of stuff required for any program. The proxy kernel is designed to be the bare minimum amount of support required to run such things. Once your processor is working, I would recommend running things on top of pk rather than bare-metal.
In the meantime, if you want to look at simple assembly, I would recommend skipping the linking phase with '-c':
riscv64-unknown-elf-gcc code.c -c -o code.o
riscv64-unknown-elf-objdump -d code.o
For examples of running code without pk or linux, I would look at riscv-tests.
I'm surprised no one mentioned gcc -S which skips assembly and linking altogether and outputs assembly code, albeit with a bunch of boilerplate, but it may be convenient just to poke around.

How to compile this asm code under linux with nasm and gcc?

I have the following source snippet from a book that I'm reading now. So I created an asm file and typed exactly. Then used nasm command (nasm -f elf test.asm) then tried to compile into an executable file by using gcc (gcc test.o -o test) then I get the following error.
Error:
ld: warning: ignoring file test.o, file was built for unsupported file format which is not the architecture being linked (x86_64)
Source code:
[BITS 16]
[SECTION .text]
START:
mov dx, eatmsg
mov ah, 9
int 21H
mov ax, 04C00H
int 21H
[SECTION .data]
eatmsg db "Eat at Joe's!", 13, 10, "$"
I guess the source code is not compatible with current generation of CPUs (the book is old...).
How do I fix this source code to run under x86_64 CPUs ?
If you want to continue using that old book to learn the basics (which is just fine, nothing wrong with learning the basics/old way before moving on to modern OS), you can run it in DOSBox, or a FreeDOS VM.
That's a 16 bits code, it was made to create pure binary code and not executables. You cannot run it on modern OSs like Linux without heavy modifications. And btw, that's an MS-DOS assembly, which will not work for Linux anyway (using int 21h which are MS-DOS services).
If you want to learn assembly, I suggest getting a more modern book, or setting up a virtual machine in which to learn with your book (although learning 16-bits assembly is really not useful nowadays).
First of all the code contains interrupts which will work in real mode only and in DOS (int 21h with values in regs), and linux works in protected mode, you cannot call these interrupts directly.
Next the code is 16 bit code, to make it a 64 bit code you need [BITS 64]
Third you have no entry point to the code. To make one you can write a main function in C and then call the starting label as a function in the assembly code.
Have a look at this thing: PC Assembly Language by Paul A. Carter

Good references for the syscalls

I need some reference but a good one, possibly with some nice examples. I need it because I am starting to write code in assembly using the NASM assembler. I have this reference:
http://bluemaster.iu.hio.no/edu/dark/lin-asm/syscalls.html
which is quite nice and useful, but it's got a lot of limitations because it doesn't explain the fields in the other registers. For example, if I am using the write syscall, I know I should put 1 in the EAX register, and the ECX is probably a pointer to the string, but what about EBX and EDX? I would like that to be explained too, that EBX determines the input (0 for stdin, 1 for something else etc.) and EDX is the length of the string to be entered, etc. etc. I hope you understood me what I want, I couldn't find any such materials so that's why I am writing here.
Thanks in advance.
The standard programming language in Linux is C. Because of that, the best descriptions of the system calls will show them as C functions to be called. Given their description as a C function and a knowledge of how to map them to the actual system call in assembly, you will be able to use any system call you want easily.
First, you need a reference for all the system calls as they would appear to a C programmer. The best one I know of is the Linux man-pages project, in particular the system calls section.
Let's take the write system call as an example, since it is the one in your question. As you can see, the first parameter is a signed integer, which is usually a file descriptor returned by the open syscall. These file descriptors could also have been inherited from your parent process, as usually happens for the first three file descriptors (0=stdin, 1=stdout, 2=stderr). The second parameter is a pointer to a buffer, and the third parameter is the buffer's size (as an unsigned integer). Finally, the function returns a signed integer, which is the number of bytes written, or a negative number for an error.
Now, how to map this to the actual system call? There are many ways to do a system call on 32-bit x86 (which is probably what you are using, based on your register names); be careful that it is completely different on 64-bit x86 (be sure you are assembling in 32-bit mode and linking a 32-bit executable; see this question for an example of how things can go wrong otherwise). The oldest, simplest and slowest of them in the 32-bit x86 is the int $0x80 method.
For the int $0x80 method, you put the system call number in %eax, and the parameters in %ebx, %ecx, %edx, %esi, %edi, and %ebp, in that order. Then you call int $0x80, and the return value from the system call is on %eax. Note that this return value is different from what the reference says; the reference shows how the C library will return it, but the system call returns -errno on error (for instance -EINVAL). The C library will move this to errno and return -1 in that case. See syscalls(2) and intro(2) for more detail.
So, in the write example, you would put the write system call number in %eax, the first parameter (file descriptor number) in %ebx, the second parameter (pointer to the string) in %ecx, and the third parameter (length of the string) in %edx. The system call will return in %eax either the number of bytes written, or the error number negated (if the return value is between -1 and -4095, it is a negated error number).
Finally, how do you find the system call numbers? They can be found at /usr/include/linux/unistd.h. On my system, this just includes /usr/include/asm/unistd.h, which finally includes /usr/include/asm/unistd_32.h, so the numbers are there (for write, you can see __NR_write is 4). The same goes for the error numbers, which come from /usr/include/linux/errno.h (on my system, after chasing the inclusion chain I find the first ones at /usr/include/asm-generic/errno-base.h and the rest at /usr/include/asm-generic/errno.h). For the system calls which use other constants or structures, their documentation tells which headers you should look at to find the corresponding definitions.
Now, as I said, int $0x80 is the oldest and slowest method. Newer processors have special system call instructions which are faster. To use them, the kernel makes available a virtual dynamic shared object (the vDSO; it is like a shared library, but in memory only) with a function you can call to do a system call using the best method available for your hardware. It also makes available special functions to get the current time without even having to do a system call, and a few other things. Of course, it is a bit harder to use if you are not using a dynamic linker.
There is also another older method, the vsyscall, which is similar to the vDSO but uses a single page at a fixed address. This method is deprecated, will result in warnings on the system log if you are using recent kernels, can be disabled on boot on even more recent kernels, and might be removed in the future. Do not use it.
If you download that web page (like it suggests in the second paragraph) and download the kernel sources, you can click the links in the "Source" column, and go directly to the source file that implements the system calls. You can read their C signatures to see what each parameter is used for.
If you're just looking for a quick reference, each of those system calls has a C library interface with the same name minus the sys_. So, for example, you could check out man 2 lseek to get the information about the parameters forsys_lseek:
off_t lseek(int fd, off_t offset, int whence);
where, as you can see, the parameters match the ones from your HTML table:
%ebx %ecx %edx
unsigned int off_t unsigned int

Resources