Which assembly is used by Linux Kernel? Is it really NASM? - linux

I'm reading https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-1.html and it reads some assembly like
.section ".reset", "ax", %progbits
.code16
.globl _start
_start:
.byte 0xe9
.int _start16bit - ( . + 2 )
...
There's a line where he compiles like this
nasm -f bin boot.nasm && qemu-system-x86_64 boot
So I thougth it was NASM assembly for linux. I went and found https://asmtutor.com/# which says it uses NASM assembly for linux. However, it's not the same thing. Just to name a few: linux kernel uses .section instead of SECTION, .globl instead of global and I don't recognize what .byte, .int, etc does.
So which assembly does linux use and where can I learn it?

The Linux kernel uses the GAS assembler(GNU Assembler) which is part of GCC. You can find reference documentation on it here.
You can find a pretty thorough introduction to GAS here provided that you already have a basic understanding of assembly in general.
As for .byte and .int, .byte places 1 or more 1 byte values that follow it into memory at the current assembler address, and .int does the same but for 32 bit integers instead of bytes.

Related

How do you assemble, link and run a .s file in linux?

I'm getting a weird error message when trying to assemble and run a .s file using AT&T Intel Syntax. Not sure if I'm even using the correct architecture to begin with, or if I'm having syntax errors, if I'm not using the correct commands to assemble and link, etc. Completely lost and I do not know where to begin.
So basically, I have a file called yea.s , which contains some simple assembler instructions. I then try to compile it using the command as yea.s -o yea.o and then link is using ld yea.o -o yea. When running ld, I get this weird message:ld: warning: cannot find entry symbol _start; defaulting to 000000440000.
This is the program im trying to run, very simple and doesn't really do anything.
resMsg: .asciz "xxxxxxxx"
.text
.global main
main:
pushq $0
ret
I just cannot figure out what's going on. Obviously, this is for school homework. I'm not looking for the answer to the homework, obviously, but this is the starting point to where I can actually start the coding. And I just cant figure out how to simple run the program, which it doesn't say in the assignment. Anyway, thanks in advance guys!
Linux executables require an entry point to be specified. The entry point is the address of the first instruction to be executed in your program. If not specified otherwise, the link editor looks for a symbol named _start to use as an entry point. Your program does not contain such a symbol, thus the linker complains and picks the beginning of the .text section as the entry point. To fix this problem, rename main to _start.
Note further that unlike on DOS, there is nothing to return to from _start. So your attempt to return is going to cause a crash. Instead, call the system call sys_exit to exit the program:
mov $0, %edi # exit status
mov $60, %eax # system call number
syscall # perform exit call
Alternatively, if you want to use the C runtime environment and call functions from the C library, leave your program as is and instead assemble and link using the C compiler driver cc:
cc -o yea yea.s
If you do so, the C runtime environment provides the entry point for you and eventually tries to call a function main which is where your code comes in. This approach is required if you want to call functions from the C library. If you do it this way, make sure that main follows the SysV ABI (calling convention).
Note that even then your code is incorrect. The return value of a function is given in the eax (resp. rax) register and not pushed on the stack. To return zero from main, write
mov $0, %eax # exit status
ret # return from function
In all currently supported versions of Ubuntu open the terminal and type:
sudo apt install as31 nasm
as31: Intel 8031/8051 assembler
This is a fast, simple, easy to use Intel 8031/8051 assembler.
nasm: General-purpose x86 assembler
Netwide Assembler. NASM will currently output flat-form binary files, a.out, COFF and ELF Unix object files, and Microsoft 16-bit DOS and Win32 object files.
If you are using NASM in Ubuntu 18.04, the commands to compile and run an .asm file named example.asm are:
nasm -f elf64 example.asm # assemble the program
ld -s -o example example.o # link the object file nasm produced into an executable file
./example # example is an executable file

What sections are necessary in a minimal dynamically-linked ELF program?

I assembled a simple "Hello, world" program and linked it using TCC, after which I got 4196 bytes of an executable.
The program has 31 sections: ['', '.text', '.data', '.bss', '.symtab', '.strtab', '.rel.text', '.rodata', '.rodata.cst4', '.note.GNU-stack', '.init', '.rel.init', '.gnu.linkonce.t.__x86.get_pc_thunk.bx', '.fini', '.rel.fini', '.text.unlikely', '.text.__x86.get_pc_thunk.bx', '.eh_frame', '.rel.eh_frame', '.preinit_array', '.init_array', '.fini_array', '.interp', '.dynsym', '.dynstr', '.hash', '.dynamic', '.got', '.plt', '.rel.got', '.shstrtab']. I feel that's a real lot for such a simple binary - which ones are actually necessary here for my program to run?
Here's the source code and the way I compiled it:
extern printf
global main
section .data
msg: db "Hello World!", 0
section .text
main:
;; puts (msg)
push msg
call printf
add esp, 4
;; return 0
mov eax, 0
ret
nasm main.asm -f elf32 && tcc main.o -o main
Tested on 32bit/ubuntu:16.04 Docker image.
Note: this question is different from this one in that I'm not looking for a tensy Linux ELF, but one that allows me to call dynamic symbols. I believe that due to the nature of dynamic linking, I need some extra sections.
I believe that due to the nature of dynamic linking, I need some extra sections.
Your belief is mistaken. No section is necessary at runtime, only segments matter.
A runnable dynamically-linked ELF binary will have at least one PT_LOAD segment, a PT_INTERP segment, and PT_DYNAMIC segment.

How to link the C Runtime Library with 'ld'?

I'm learning assembly with NASM for a class I have in college. I would like to link the C Runtime Library with ld, but I just can't seem to wrap my head around it. I have a 64 bit machine with Linux Mint installed.
The reason I'm confused is that -- to my knowledge -- instead of linking the C runtime, gcc copies the things that you need into your program. I might be wrong though, so don't hesitate to correct me on this, please.
What I did up to this point is, to link it using gcc. That produces a mess of a machine code that I'm unable to follow though, even for a small program like swapping rax with rbx, which isn't that great for learning purposes. (Please note that the program works.)
I'm not sure if it's relevant, but these are the commands that I'm using to compile and link:
# compilation
nasm -f elf64 swap.asm
# gcc
gcc -o swap swap.o
# ld, no c runtime
ld -s -o swap swap.o
Thank you in advance!
Conclusion:
Now that I have a proper answer to the question, here are a few things that I would like to mention. Linking glibc dynamically can be done like in Z boson's answer (for 64 bit systems). If you would like to do it statically, do follow this link (that I'm re-posting from Z boson's answer).
Here's an article that Jester posted, about how programs start in linux.
To see what gcc does to link your .o-s, try this command out: gcc -v -o swap swap.o. Note that 'v' stands for 'verbose'.
Also, you should read this if you are interested in 64 bit assembly.
Thank you for your answers and helpful insight! End of speech.
Here is an example which uses libc without using GCC.
extern printf
extern _exit
section .data
hello: db 'Hello world!',10
section .text
global _start
_start:
xor eax, eax
mov edi, hello
call printf
mov rax, 0
jmp _exit
Compile and link like this:
nasm -f elf64 hello.asm
ld hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -m elf_x86_64
This has worked fine so far for me but for static linkage it's complicated.
If you want to call simple library functions like atoi, but still avoid using the C runtime, you can do that. (i.e. you write _start, rather than just writing a main that gets called after a bunch of boiler-plate code runs.)
gcc -o swap -nostartfiles swap.o
As people say in comments, some parts of glibc depend on constructors/destructors run from the standard startup files. Probably this is the case for stdio (puts/printf/scanf/getchar), and maybe malloc. A lot of functions are "pure" functions that just process the input they're given, though. sprintf/sscanf might be ok to use.
For example:
$ cat >exit64.asm <<EOF
section .text
extern exit
global _start
_start:
xor edi, edi
jmp exit ; doesn't return, so optimize like a tail-call
;; or make the syscall directly, if the jmp is commented
mov eax, 231 ; exit(0)
syscall
; movl eax, 1 ; 32bit call
; int 0x80
EOF
$ yasm -felf64 exit64.asm && gcc -nostartfiles exit64.o -o exit64-dynamic
$ nm exit64-dynamic
0000000000601020 D __bss_start
0000000000600ec0 d _DYNAMIC
0000000000601020 D _edata
0000000000601020 D _end
U exit##GLIBC_2.2.5
0000000000601000 d _GLOBAL_OFFSET_TABLE_
00000000004002d0 T _start
$ ltrace ./exit64-dynamic
enable_breakpoint pid=11334, addr=0x1, symbol=(null): Input/output error
exit(0 <no return ...>
+++ exited (status 0) +++
$ strace ... # shows the usual system calls by the runtime dynamic linker

Linux (64-bit), nasm and gdb

I was searching other threads without luck.
My problem is perhaps simple but frustrating.
I'm compiling two files on 64-bit Ubuntu 11.04:
nasm -f elf64 -g file64.asm
gcc -g -o file file.c file64.o
Then I debug the resulting executables with gdb.
With C, everything is OK.
However, when debugging assembly, the source code is "not visible" to the debugger. I'm getting the following output:
(gdb) step
Single stepping until exit from function line,
which has no line number information.
0x0000000000400962 in convert ()
A quick investigation with:
objdump --source file64.o
shows that the assembly source code (and line information) is contained in the file.
Why can't I see it in a debug session? What am I doing wrong?
These problems arose after moving to 64-bit Ubuntu. In the 32-bit Linux it worked (as it should).
With NASM, I've had much better experience in gdb when using the dwarf debugging format. gdb then treats the assembly source as if it were any other language (i.e., no disassemble commands necessary)
nasm -f elf64 -g -F dwarf file64.asm
(Versions 2.03.01 and later automatically enable -g if -F is specified.)
I'm using NASM version 2.10.07. I'm not sure if that makes a difference or not.
GDB is a source-level (or symbolic) debugger, which means that it's supposed to work with 'high-level programming languages' ... which is not you're case!
But wait a second, because, from a debugger's point of view, debugging ASM programs is way easier than higher level languages: there's almost nothing to do! The program binary always contains the assembly instruction, there're just written in their machine format, instead of ascii format.
And GDB has the ability to convert it for you. Instead of executing list to see the code, use disassemble to see a function code:
(gdb) disassemble <your symbol>
Dump of assembler code for function <your symbol>:
0x000000000040051e <+0>: push %rbp
0x000000000040051f <+1>: mov %rsp,%rbp
=> 0x0000000000400522 <+4>: mov 0x20042f(%rip),%rax
0x0000000000400529 <+11>: mov %rax,%rdx
0x000000000040052c <+14>: mov $0x400678,%eax
0x0000000000400531 <+19>: mov %rdx,%rcx
or x/5i $pc to see 5 i nstruction after your $pc
(gdb) x/5i $pc
=> 0x400522 <main+4>: mov 0x20042f(%rip),%rax
0x400529 <main+11>: mov %rax,%rdx
0x40052c <main+14>: mov $0x400678,%eax
0x400531 <main+19>: mov %rdx,%rcx
0x400534 <main+22>: mov $0xc,%edx
then use stepi (si) instread of step and nexti (ni) instead of next.
display $pc could also be useful to print the current pc whenever the inferior stops (ie, after each nexti/stepi.
For anyone else stuck with the broken things on NASM (the bug is not fixed so far): just download the NASM git repository and switch to version 2.7, which is probably the last version that works fine, i.e. supports gdb. Building from source this outdated version is only a workaround (you don't have support for the last ISA for example), but it's sufficient for most students.
GDB might not know where to search for your source files. Try to explicitly tell it with directory.

Compile/run assembler in Linux?

I'm fairly new to Linux (Ubuntu 10.04) and a total novice to assembler. I was following some tutorials and I couldn't find anything specific to Linux.
So, my question is, what is a good package to compile/run assembler and what are the command line commands to compile/run for that package?
The GNU assembler is probably already installed on your system. Try man as to see full usage information. You can use as to compile individual files and ld to link if you really, really want to.
However, GCC makes a great front-end. It can assemble .s files for you. For example:
$ cat >hello.s <<"EOF"
.section .rodata # read-only static data
.globl hello
hello:
.string "Hello, world!" # zero-terminated C string
.text
.global main
main:
push %rbp
mov %rsp, %rbp # create a stack frame
mov $hello, %edi # put the address of hello into RDI
call puts # as the first arg for puts
mov $0, %eax # return value = 0. Normally xor %eax,%eax
leave # tear down the stack frame
ret # pop the return address off the stack into RIP
EOF
$ gcc hello.s -no-pie -o hello
$ ./hello
Hello, world!
The code above is x86-64. If you want to make a position-independent executable (PIE), you'd need lea hello(%rip), %rdi, and call puts#plt.
A non-PIE executable (position-dependent) can use 32-bit absolute addressing for static data, but a PIE should use RIP-relative LEA. (See also Difference between movq and movabsq in x86-64 neither movq nor movabsq are a good choice.)
If you wanted to write 32-bit code, the calling convention is different, and RIP-relative addressing isn't available. (So you'd push $hello before the call, and pop the stack args after.)
You can also compile C/C++ code directly to assembly if you're curious how something works:
$ cat >hello.c <<EOF
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
return 0;
}
EOF
$ gcc -S hello.c -o hello.s
See also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler output, and writing useful small functions that will compile to interesting output.
The GNU assembler (gas) and NASM are both good choices. However, they have some differences, the big one being the order you put operations and their operands.
gas uses AT&T syntax (guide: https://stackoverflow.com/tags/att/info):
mnemonic source, destination
nasm uses Intel style (guide: https://stackoverflow.com/tags/intel-syntax/info):
mnemonic destination, source
Either one will probably do what you need. GAS also has an Intel-syntax mode, which is a lot like MASM, not NASM.
Try out this tutorial: http://asm.sourceforge.net/intro/Assembly-Intro.html
See also more links to guides and docs in Stack Overflow's x86 tag wiki
If you are using NASM, the command-line is just
nasm -felf32 -g -Fdwarf file.asm -o file.o
where 'file.asm' is your assembly file (code) and 'file.o' is an object file you can link with gcc -m32 or ld -melf_i386. (Assembling with nasm -felf64 will make a 64-bit object file, but the hello world example below uses 32-bit system calls, and won't work in a PIE executable.)
Here is some more info:
http://www.nasm.us/doc/nasmdoc2.html#section-2.1
You can install NASM in Ubuntu with the following command:
apt-get install nasm
Here is a basic Hello World in Linux assembly to whet your appetite:
http://web.archive.org/web/20120822144129/http://www.cin.ufpe.br/~if817/arquivos/asmtut/index.html
I hope this is what you were asking...
There is also FASM for Linux.
format ELF executable
segment readable executable
start:
mov eax, 4
mov ebx, 1
mov ecx, hello_msg
mov edx, hello_size
int 80h
mov eax, 1
mov ebx, 0
int 80h
segment readable writeable
hello_msg db "Hello World!",10,0
hello_size = $-hello_msg
It comiles with
fasm hello.asm hello
My suggestion would be to get the book Programming From Ground Up:
http://nongnu.askapache.com/pgubook/ProgrammingGroundUp-1-0-booksize.pdf
That is a very good starting point for getting into assembler programming under linux and it explains a lot of the basics you need to understand to get started.
The assembler(GNU) is as(1)
3 syntax (nasm, tasm, gas ) in 1 assembler, yasm.
http://www.tortall.net/projects/yasm/
For Ubuntu 18.04 installnasm . Open the terminal and type:
sudo apt install as31 nasm
nasm docs
For compiling and running:
nasm -f elf64 example.asm # assemble the program
ld -s -o example example.o # link the object file nasm produced into an executable file
./example # example is an executable file

Resources