memory access when writing a linux kernel module in assembler - linux

i try writing a kernel module in assembler. In one time i needed a global vars. I define a dword in .data (or .bss) section, and in init function i try add 1 to var. My program seccesfully make, but insmod sey me:
$ sudo insmod ./test.ko
insmod: ERROR: could not insert module ./test.ko: Invalid module format
it's my assembler code in nasm:
[bits 64]
global init
global cleanup
extern printk
section .data
init_mess db "Hello!", 10, 0
g_var dd 0
section .text
init:
push rbp
mov rbp, rsp
inc dword [g_var]
mov rdi, init_mess
xor rax, rax
call printk
xor rax, rax
mov rsp, rbp
pop rbp
ret
cleanup:
xor rax, rax
ret
if i write adding in C code, all work good:
static i = 0;
static int __init main_init(void) { i++; return init(); }
But in this objdump -d test.ko write a very stainght code for me:
0000000000000000 <init_module>:
0: 55 push %rbp
1: ff 05 00 00 00 00 incl 0x0(%rip) # 7 <init_module+0x7>
7: 48 89 e5 mov %rsp,%rbp
a: e8 00 00 00 00 callq f <init_module+0xf>
f: 5d pop %rbp
10: c3 retq
What does this mean (incl 0x0(%rip))? How can I access memory? Please, help me :)
(My system is archlinux x86_64)
my C code for correct make a module:
#include <linux/module.h>
#include <linux/init.h>
MODULE_AUTHOR("Actics");
MODULE_DESCRIPTION("Description");
MODULE_LICENSE("GPL");
extern int init(void);
extern int cleanup(void);
static int __init main_init(void) { return init(); }
static void __exit main_cleanup(void) { cleanup(); }
module_init(main_init);
module_exit(main_cleanup);
and my Makefile:
obj-m := test.o
test-objs := inthan.o module.o
KVERSION = $(shell uname -r)
inthan.o: inthan.asm
nasm -f elf64 -o $# $^
build:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules

Kernel mode lives in the "negative" (ie. top) part of the address space, where 32 bit absolute addresses can not be used (because they are not sign-extended). As you have noticed, gcc uses rip-relative addresses to work around this problem which gives offsets from the current instruction pointer. You can make nasm do the same by using the DEFAULT REL directive. See the relevant section in the nasm documentation.

you can always use inline assembly
asm("add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0" \
54 : "=&r" (flag), "=r" (roksum) \
55 : "1" (addr), "g" ((long)(size)), \
56 "rm" (limit));

Related

How can I find a proper DWARF symbol for address in a Linux kernel module?

I have a kernel module with DWARF debug info. All its ELF sections have zero start addresses.
Its DWARF info contains multiple overlapping code and data symbols. It also has many compilation units with zero DW_AT_low_pc address.
Is there a way to find the right DWARF symbol for a certain location in the binary file?
It also has many compilation units with zero DW_AT_low_pc address.
A kernel module is just a ET_REL object file. The kernel knows how to link it in into its address space.
The reason DW_AT_low_pcs are all 0s is because there are relocation entries that would have told the ld how to relocate them iff ld was to perform the link.
You could examine these entries with readelf -Wr module.ko, but it's much easier to ask GDB to do so.
Example:
// f.c
int foo(int x)
{
return x;
}
int bar(int x)
{
return foo(x);
}
gcc -c -g f.c -ffunction-sections
Resulting f.o has everything at 0:
nm f.o
0000000000000000 T bar
0000000000000000 T foo
readelf -wi f.o | grep low_pc
<1d> DW_AT_low_pc : 0x0
<35> DW_AT_low_pc : 0x0
<6c> DW_AT_low_pc : 0x0
But when loaded into GDB, you can see relocated entries (because GDB knows how to apply them):
gdb -q f.o
Reading symbols from f.o...
(gdb) p &foo
$1 = (int (*)(int)) 0x0 <foo>
(gdb) p &bar
$2 = (int (*)(int)) 0xc <bar> <<=== not 0
(gdb) disas/s foo
Dump of assembler code for function foo:
f.c:
2 {
0x0000000000000000 <+0>: push %rbp
0x0000000000000001 <+1>: mov %rsp,%rbp
0x0000000000000004 <+4>: mov %edi,-0x4(%rbp)
3 return x;
0x0000000000000007 <+7>: mov -0x4(%rbp),%eax
4 }
0x000000000000000a <+10>: pop %rbp
0x000000000000000b <+11>: retq
End of assembler dump.
(gdb) disas/s bar
Dump of assembler code for function bar:
f.c:
7 {
0x000000000000000c <+0>: push %rbp
0x000000000000000d <+1>: mov %rsp,%rbp
0x0000000000000010 <+4>: sub $0x8,%rsp
0x0000000000000014 <+8>: mov %edi,-0x4(%rbp)
8 return foo(x); <<=== correct line
0x0000000000000017 <+11>: mov -0x4(%rbp),%eax
0x000000000000001a <+14>: mov %eax,%edi
0x000000000000001c <+16>: callq 0x21 <bar+21>
9 }
0x0000000000000021 <+21>: leaveq
0x0000000000000022 <+22>: retq
End of assembler dump.

Basic input with x64 assembly code

I am writing a tutorial on basic input and output in assembly. I am using a Linux distribution (Ubuntu) that is 64 bit. For the first part of my tutorial I spoke about basic output and created a simple program like this:
global _start
section .text
_start:
mov rax,1
mov rdi,1
mov rsi,message
mov rdx,13
syscall
mov rax,60
xor rdi,rdi
syscall
section .data
message: db "Hello, World", 10
That works great. The system prints the string and exits cleanly. For the next part of my tutorial, I simply want to read one character in from the keyboard. From my understanding of this web site we change the rdi register to be 0 for a sys_read call.
I first subtract 8 from the current rsp and then load that address into the rsi register. (That is where I want to store the char). When I compile and run my program it appears to work... but the terminal seems to mimick the input I type in again.
Here is the program:
global _start
section .text
_start:
sub rsp,8 ; allocate space on the stack to read
mov rdi,0 ; set rdi to 0 to indicate a system read
mov rsi,[rsp-8]
mov rdx,1
syscall
mov rax,1
mov rdi,1
mov rsi,message
mov rdx,13
syscall
mov rax,60
xor rdi,rdi
syscall
section .data
message: db "Hello, World", 10
and this is what happens in my terminal...
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ nasm -felf64 rps.asm && ld rps.o && ./a.out
5
Hello, World
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ 5
5: command not found
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$
The input 5 is repeated back to the terminal after the program has exited. What is the proper way to read in a single char using NASM and Linux x64?
In your first code section you have to set the SYS_CALL to 0 for SYS_READ (as mentioned rudimentically in the other answer).
So check a Linux x64 SYS_CALL list for the appropriate parameters and try
_start:
mov rax, 0 ; set SYS_READ as SYS_CALL value
sub rsp, 8 ; allocate 8-byte space on the stack as read buffer
mov rdi, 0 ; set rdi to 0 to indicate a STDIN file descriptor
lea rsi, [rsp] ; set const char *buf to the 8-byte space on stack
mov rdx, 1 ; set size_t count to 1 for one char
syscall
it appears to work... but the terminal seems to mimick the input I type in again.
No, the 5 + newline that bash reads is the one you typed. Your program waited for input but didn't actually read the input, leaving it in the kernel's terminal input buffer for bash to read after your program exited. (And bash does its own echoing of terminal input because it puts the terminal in no-echo mode before reading; the normal mechanism for characters to appear on the command line as you type is for bash to print what it reads.)
How did your program manage to wait for input without reading any? mov rsi, [rsp-8] loads 8 bytes from that address. You should have used lea to set rsi to point to that location instead of loading what was in that buffer. So read fails with -EFAULT instead of reading anything, but interestingly it doesn't check this until after waiting for there to be some terminal input.
I used strace ./foo to trace system calls made by your program:
execve("./foo", ["./foo"], 0x7ffe90b8e850 /* 51 vars */) = 0
read(0, 5
NULL, 1) = -1 EFAULT (Bad address)
write(1, "Hello, World\n", 13Hello, World
) = 13
exit(0) = ?
+++ exited with 0 +++
Normal terminal input/output is mixed with the strace output; I could have used -o foo.trace or whatever. The cleaned-up version of the read system call trace (without the 5\n mixed in) is:
read(0, NULL, 1) = -1 EFAULT (Bad address)
So (as expected for _start in a static executable under Linux), the memory below RSP was zeroed. But anything that isn't a pointer to writeable memory would have produced the same result.
zx485's answer is correct but inefficient (large code-size and an extra instruction). You don't need to worry about efficiency right away, but it's one of the main reasons for doing anything with asm and there's interesting stuff to say about this case.
You don't need to modify RSP; you can use the red-zone (memory below RSP) because you don't need to make any function calls. This is what you were trying to do with rsp-8, I think. (Or else you didn't realize that it was only safe because of special circumstances...)
The read system call's signature is
ssize_t read(int fd, void *buf, size_t count);
so fd is an integer arg, so it's only looking at edi not rdi. You don't need to write the full rdi, just the regular 32-bit edi. (32-bit operand-size is usually the most efficient thing on x86-64).
But for zero or positive integers, just setting edi also sets rdi anyway. (Anything you write to edi is zero-extended into the full rdi) And of course zeroing a register is best done with xor same,same; this is probably the best-known x86 peephole optimization trick.
As the OP later commented, reading only 1 byte will leave the newline unread, when the input is 5\n, and that would make bash read it and print an extra prompt. We can bump up the size of the read and the space for the buffer to 2 bytes. (There'd be no downside to using lea rsi, [rsp-8] and leave a gap; I'm using lea rsi, [rsp-2] to pack the buffer right below argc on the stack, or below the return value if this was a function instead of a process entry point. Mostly to show exactly how much space is needed.)
; One read of up to 2 characters
; giving the user room to type a digit + newline
_start:
;mov eax, 0 ; set SYS_READ as SYS_CALL value
xor eax, eax ; rax = __NR_read = 0 from unistd_64.h
lea rsi, [rsp-2] ; rsi = buf = rsp-2
xor edi, edi ; edi = fd = 0 (stdin)
mov edx, 2 ; rdx = count = 2 char
syscall ; sys_read(0, rsp-2, 2)
; total = 16 bytes
This assembles like so:
+ yasm -felf64 -Worphan-labels -gdwarf2 foo.asm
+ ld -o foo foo.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
$ objdump -drwC -Mintel
0000000000400080 <_start>:
400080: 31 c0 xor eax,eax
400082: 48 8d 74 24 ff lea rsi,[rsp-0x1]
400087: 31 ff xor edi,edi
400089: ba 01 00 00 00 mov edx,0x1
40008e: 0f 05 syscall
; next address = ...90
; I left out the rest of the program so you can't actually *run* foo
; but I used a script that assembles + links, and disassembles the result
; The linking step is irrelevant for just looking at the code here.
By comparison, zx485's answer assembles to 31 bytes. Code size is not the most important thing, but when all else is equal, smaller is better for L1i cache density, and sometimes decode efficiency. (And my version has fewer instructions, too.)
0000000000400080 <_start>:
400080: 48 c7 c0 00 00 00 00 mov rax,0x0
400087: 48 83 ec 08 sub rsp,0x8
40008b: 48 c7 c7 00 00 00 00 mov rdi,0x0
400092: 48 8d 34 24 lea rsi,[rsp]
400096: 48 c7 c2 01 00 00 00 mov rdx,0x1
40009d: 0f 05 syscall
; total = 31 bytes
Note how those mov reg,constant instructions use the 7-byte mov r64, sign_extended_imm32 encoding. (NASM optimizes those to 5-byte mov r32, imm32 for a total of 25 bytes, but it can't optimize mov to xor because xor affects flags; you have to do that optimization yourself.)
Also, if you are going to modify RSP to reserve space, you only need mov rsi, rsp not lea. Only use lea reg1, [rsp] (with no displacement) if you're padding your code with longer instructions instead of using a NOP for alignment. For source registers other than rsp or rbp, lea won't be longer but it is still slower than mov. (But by all means use lea to copy-and-add. I'm just saying it's pointless when you can replace it with a mov.)
You could save even more space by using lea edx, [rax+1] instead of mov edx,1 at essentially no performance cost, but that's not something compilers normally do. (Although perhaps they should.)
You need to set eax to the system call number for read.

Confusion on assembly output of virtual table in Visual C++ 2015

I'm confused by the assembly output of Visual C++ 2015 (x86).
I want to know the virtual table layout in VC, so I write the following simple class with a virtual function.
#include <stdio.h>
struct Foo
{
virtual int GetValue()
{
uintptr_t vtbl = *(uintptr_t *)this;
uintptr_t slot0 = ((uintptr_t *)vtbl)[0];
uintptr_t slot1 = ((uintptr_t *)vtbl)[1];
printf("vtbl = 0x%08X\n", vtbl);
printf(" [0] = 0x%08X\n", slot0);
printf(" [1] = 0x%08X\n", slot1);
return 0xA11BABA;
}
};
extern "C" void Check();
int main()
{
Foo *pFoo = new Foo;
int x = pFoo->GetValue();
printf("x = 0x%08X\n", x);
printf("\n");
Check();
}
And to check the layout, I implement an assembly function (the magic name comes from the assembly output vtab.asm of vtab.cpp, and is the mangled version of Foo::GetValue).
.model flat
extern _printf : proc
extern ?GetValue#Foo##UAEHXZ : proc
.const
FUNC_ADDR db "Address of Foo::GetValue = 0x%08X", 10, 0
.code
_Check proc
push ebp
mov esp, ebp
push offset ?GetValue#Foo##UAEHXZ
push offset FUNC_ADDR
call _printf
add esp, 8
pop ebp
ret
_Check endp
end
Then, I compile and run.
ml /c check.asm
cl /Fa vtab.cpp check.obj
vtab
And get the following output on my computer.
vtbl = 0x00FF2174
[0] = 0x00FE1300
[1] = 0x6C627476
x = 0x0A11BABA
Address of Foo::GetValue = 0x00FE1300
It clearly shows the virtual function GetValue is at offset 0 of the virtual table. But the assembly output of vtab.cpp seems to imply GetValue is at offset 4 (see the following comments start with three semicolons).
; COMDAT ??_7Foo##6B#
CONST SEGMENT
??_7Foo##6B# DD FLAT:??_R4Foo##6B# ; Foo::`vftable'
DD FLAT:?GetValue#Foo##UAEHXZ ;;; GetValue at offset 4
CONST ENDS
; Function compile flags: /Odtp
; COMDAT ??0Foo##QAE#XZ
_TEXT SEGMENT
_this$ = -4 ; size = 4
??0Foo##QAE#XZ PROC ; Foo::Foo, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
mov DWORD PTR [eax], OFFSET ??_7Foo##6B# ;;; Init ptr to virtual table
mov eax, DWORD PTR _this$[ebp]
mov esp, ebp
pop ebp
ret 0
??0Foo##QAE#XZ ENDP ; Foo::Foo
Thanks for your answering!
Update
#Hans Passant This seems to be a bug. I ml /c the assembly output vtab.asm (with a few symbols deletion) and link it with check.obj to get an exe vtab2.exe. But vtab2.exe won't run correctly. Then I modify the following code
; COMDAT ??_7Foo##6B#
CONST SEGMENT
??_7Foo##6B# DD FLAT:??_R4Foo##6B# ; Foo::`vftable'
DD FLAT:?GetValue#Foo##UAEHXZ
CONST ENDS
to
; COMDAT ??_7Foo##6B#
CONST SEGMENT
__NOT_USED_ DD FLAT:??_R4Foo##6B# ; Foo::`vftable'
??_7Foo##6B# DD FLAT:?GetValue#Foo##UAEHXZ
CONST ENDS
and ml and link again to get vtab3.exe. Now vtab3.exe runs correctly and produces an output similar to vtab.exe.
I don't think Microsoft would consider this a bug. Yes, the assembly output should have the vtable symbol on the second element of the vtable so that the RTTI entry appears at offset -4 of the table. However the table should also be in a COMDAT section, but instead there's only a comment in the assembly output (; COMDAT) that indicates this. That's because while the PECOFF object file format supports COMDAT sections, the assembler (MASM, invoked as ml) doesn't. There's no way for the compiler to generate an assembly file that actually corresponds to the contents of the object file it creates.
Or to put it another way, the assembly output isn't meant to be assembled. It's just meant to be informative. Even with your fix applied the assembly output doesn't generate the same object file the compiler does. If you did this in a more realistic project where Foo was used in more than one object file you'd get multiple definition errors when linking. If you want to see the real output of the compiler you need to look at the object file.
For example if you use dumpbin /all vtab.obj and go through its output, you'll see something like:
SECTION HEADER #C
.rdata name
...
40301040 flags
Initialized Data
COMDAT; sym= "const Foo::`vftable'" (??_7Foo##6B#)
4 byte align
Read Only
RAW DATA #C
00000000: 00 00 00 00 00 00 00 00 ........
RELOCATIONS #C
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000000 DIR32 00000000 34 ??_R4Foo##6B# (const Foo::`RTTI Complete Object Locator')
00000004 DIR32 00000000 1F ?GetValue#Foo##UAEHXZ (public: virtual int __thiscall Foo::GetValue(void))
...
COFF SYMBOL TABLE
...
026 00000000 SECTC notype Static | .rdata
Section length 8, #relocs 2, #linenums 0, checksum 0, selection 6 (pick largest)
028 00000004 SECTC notype External | ??_7Foo##6B# (const Foo::`vftable')
It's not easy to understand, but all the information about the actual layout of the vtable is given. The symbol for the vtable, ??_7Foo##6B# (const Foo::`vftable'), is at offset 00000004 of SECTC or section number 0xC. Section #C is 8 bytes long and has relocations for the RTTI locator and Foo::GetValue that are applied at offsets 00000000 and 00000004 of the section. So you can see that in the object file the vtable symbol does in fact point to the entry containing the pointer to the first virtual method.
Open Watcom has a utility that can show you the contents of an object file in a more assembly-like fashion, though notably not in the syntax that MASM uses. Running wdis t279.obj shows:
.new_section .rdata, "dr2"
0000 00 00 00 00 .long ??_R4Foo##6B#
0004 ??_7Foo##6B#:
0004 00 00 00 00 .long ?GetValue#Foo##UAEHXZ

Linux how to invoke a system call via sysenter? [duplicate]

How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.
I know in x86, we can use
__asm__(
" movl $1, %eax \n"
" movl $0, %ebx \n"
" call *%gs:0x10 \n"
);
to route to sysenter indirectly.
But how can we code using sysenter/syscall directly to issue a system call?
I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
First of all, you can't safely use GNU C Basic asm(""); syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like "D"(1) means as part of an asm() statement.
You also need asm volatile because that's not implicit for Extended asm statements with 1 or more output operands.
I'm going to show you how to execute system calls by writing a program that writes Hello World! to standard output by using the write() system call. Here's the source of the program without an implementation of the actual system call :
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size);
int main(void)
{
const char hello[] = "Hello world!\n";
my_write(1, hello, sizeof(hello));
return 0;
}
You can see that I named my custom system call function as my_write in order to avoid name clashes with the "normal" write, provided by libc. The rest of this answer contains the source of my_write for i386 and amd64.
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80 in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER was never meant as a direct replacement of the int 0x80 API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10 in your code is for), which contains all the code supporting the SYSENTER instruction. There's quite a lot of it because of how the instruction actually works.
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls - some system calls like getpid and clock_gettime are so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even than sysenter could be.
It's much easier to use the slower int $0x80 to invoke the 32-bit ABI.
// i386 Linux
#include <asm/unistd.h> // compile with -m32 for 32 bit call numbers
//#define __NR_write 4
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"int $0x80"
: "=a" (ret)
: "0"(__NR_write), "b"(fd), "c"(buf), "d"(size)
: "memory" // the kernel dereferences pointer args
);
return ret;
}
As you can see, using the int 0x80 API is relatively simple. The number of the syscall goes to the eax register, while all the parameters needed for the syscall go into respectively ebx, ecx, edx, esi, edi, and ebp. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h.
Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2).
The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Keep in mind that the clobber list also contains the memory parameter, which means that the instruction listed in the instruction list references memory (via the buf parameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
amd64
Things look different on the AMD64 architecture which sports a new instruction called SYSCALL. It is very different from the original SYSENTER instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL, actually, and adapting the old int 0x80 to the new SYSCALL is pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).
In this case, the number of the system call is still passed in the register rax, but the registers used to hold the arguments now nearly match the function calling convention: rdi, rsi, rdx, r10, r8 and r9 in that order. (syscall itself destroys rcx so r10 is used instead of rcx, letting libc wrapper functions just use mov r10, rcx / syscall.)
// x86-64 Linux
#include <asm/unistd.h> // compile without -m32 for 64 bit call numbers
// #define __NR_write 1
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"syscall"
: "=a" (ret)
// EDI RSI RDX
: "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
: "rcx", "r11", "memory"
);
return ret;
}
(See it compile on Godbolt)
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
The "0"(callnum) matching constraint could be written as "a" because operand 0 (the "=a"(ret) output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.
Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.
Explicit register variables
https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Explicit-Register-Variables.html#Explicit-Reg-Vars)
I believe this should now generally be the recommended approach over register constraints because:
it can represent all registers, including r8, r9 and r10 which are used for system call arguments: How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?
it's the only optimal option for other ISAs besides x86 like ARM, which don't have the magic register constraint names: How to specify an individual register as constraint in ARM GCC inline assembly? (besides using a temporary register + clobbers + and an extra mov instruction)
I'll argue that this syntax is more readable than using the single letter mnemonics such as S -> rsi
Register variables are used for example in glibc 2.29, see: sysdeps/unix/sysv/linux/x86_64/sysdep.h.
main_reg.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
register int64_t rax __asm__ ("rax") = 1;
register int rdi __asm__ ("rdi") = fd;
register const void *rsi __asm__ ("rsi") = buf;
register size_t rdx __asm__ ("rdx") = size;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi), "r" (rsi), "r" (rdx)
: "rcx", "r11", "memory"
);
return rax;
}
void my_exit(int exit_status) {
register int64_t rax __asm__ ("rax") = 60;
register int rdi __asm__ ("rdi") = exit_status;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Compile and run:
gcc -O3 -std=c99 -ggdb3 -ffreestanding -nostdlib -Wall -Werror \
-pedantic -o main_reg.out main_reg.c
./main.out
echo $?
Output
hello world
0
For comparison, the following analogous to How to invoke a system call via syscall or sysenter in inline assembly? produces equivalent assembly:
main_constraint.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (1), "D" (fd), "S" (buf), "d" (size)
: "rcx", "r11", "memory"
);
return ret;
}
void my_exit(int exit_status) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (60), "D" (exit_status)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Disassembly of both with:
objdump -d main_reg.out
is almost identical, here is the main_reg.c one:
Disassembly of section .text:
0000000000001000 <my_write>:
1000: b8 01 00 00 00 mov $0x1,%eax
1005: 0f 05 syscall
1007: c3 retq
1008: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
100f: 00
0000000000001010 <my_exit>:
1010: b8 3c 00 00 00 mov $0x3c,%eax
1015: 0f 05 syscall
1017: c3 retq
1018: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101f: 00
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: bf 01 00 00 00 mov $0x1,%edi
102a: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: ba 0d 00 00 00 mov $0xd,%edx
1043: b8 01 00 00 00 mov $0x1,%eax
1048: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104f: 0a
1050: 0f 05 syscall
1052: 31 ff xor %edi,%edi
1054: 48 83 f8 0d cmp $0xd,%rax
1058: b8 3c 00 00 00 mov $0x3c,%eax
105d: 40 0f 95 c7 setne %dil
1061: 0f 05 syscall
1063: c3 retq
So we see that GCC inlined those tiny syscall functions as would be desired.
my_write and my_exit are the same for both, but _start in main_constraint.c is slightly different:
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102a: ba 0d 00 00 00 mov $0xd,%edx
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: b8 01 00 00 00 mov $0x1,%eax
1043: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104a: 0a
104b: 89 c7 mov %eax,%edi
104d: 0f 05 syscall
104f: 31 ff xor %edi,%edi
1051: 48 83 f8 0d cmp $0xd,%rax
1055: b8 3c 00 00 00 mov $0x3c,%eax
105a: 40 0f 95 c7 setne %dil
105e: 0f 05 syscall
1060: c3 retq
It is interesting to observe that in this case GCC found a slightly shorter equivalent encoding by picking:
104b: 89 c7 mov %eax,%edi
to set the fd to 1, which equals the 1 from the syscall number, rather than a more direct:
1025: bf 01 00 00 00 mov $0x1,%edi
For an in-depth discussion of the calling conventions, see also: What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Tested in Ubuntu 18.10, GCC 8.2.0.

How to invoke a system call via syscall or sysenter in inline assembly?

How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.
I know in x86, we can use
__asm__(
" movl $1, %eax \n"
" movl $0, %ebx \n"
" call *%gs:0x10 \n"
);
to route to sysenter indirectly.
But how can we code using sysenter/syscall directly to issue a system call?
I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
First of all, you can't safely use GNU C Basic asm(""); syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like "D"(1) means as part of an asm() statement.
You also need asm volatile because that's not implicit for Extended asm statements with 1 or more output operands.
I'm going to show you how to execute system calls by writing a program that writes Hello World! to standard output by using the write() system call. Here's the source of the program without an implementation of the actual system call :
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size);
int main(void)
{
const char hello[] = "Hello world!\n";
my_write(1, hello, sizeof(hello));
return 0;
}
You can see that I named my custom system call function as my_write in order to avoid name clashes with the "normal" write, provided by libc. The rest of this answer contains the source of my_write for i386 and amd64.
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80 in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER was never meant as a direct replacement of the int 0x80 API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10 in your code is for), which contains all the code supporting the SYSENTER instruction. There's quite a lot of it because of how the instruction actually works.
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls - some system calls like getpid and clock_gettime are so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even than sysenter could be.
It's much easier to use the slower int $0x80 to invoke the 32-bit ABI.
// i386 Linux
#include <asm/unistd.h> // compile with -m32 for 32 bit call numbers
//#define __NR_write 4
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"int $0x80"
: "=a" (ret)
: "0"(__NR_write), "b"(fd), "c"(buf), "d"(size)
: "memory" // the kernel dereferences pointer args
);
return ret;
}
As you can see, using the int 0x80 API is relatively simple. The number of the syscall goes to the eax register, while all the parameters needed for the syscall go into respectively ebx, ecx, edx, esi, edi, and ebp. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h.
Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2).
The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Keep in mind that the clobber list also contains the memory parameter, which means that the instruction listed in the instruction list references memory (via the buf parameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
amd64
Things look different on the AMD64 architecture which sports a new instruction called SYSCALL. It is very different from the original SYSENTER instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL, actually, and adapting the old int 0x80 to the new SYSCALL is pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).
In this case, the number of the system call is still passed in the register rax, but the registers used to hold the arguments now nearly match the function calling convention: rdi, rsi, rdx, r10, r8 and r9 in that order. (syscall itself destroys rcx so r10 is used instead of rcx, letting libc wrapper functions just use mov r10, rcx / syscall.)
// x86-64 Linux
#include <asm/unistd.h> // compile without -m32 for 64 bit call numbers
// #define __NR_write 1
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"syscall"
: "=a" (ret)
// EDI RSI RDX
: "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
: "rcx", "r11", "memory"
);
return ret;
}
(See it compile on Godbolt)
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
The "0"(callnum) matching constraint could be written as "a" because operand 0 (the "=a"(ret) output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.
Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.
Explicit register variables
https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Explicit-Register-Variables.html#Explicit-Reg-Vars)
I believe this should now generally be the recommended approach over register constraints because:
it can represent all registers, including r8, r9 and r10 which are used for system call arguments: How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?
it's the only optimal option for other ISAs besides x86 like ARM, which don't have the magic register constraint names: How to specify an individual register as constraint in ARM GCC inline assembly? (besides using a temporary register + clobbers + and an extra mov instruction)
I'll argue that this syntax is more readable than using the single letter mnemonics such as S -> rsi
Register variables are used for example in glibc 2.29, see: sysdeps/unix/sysv/linux/x86_64/sysdep.h.
main_reg.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
register int64_t rax __asm__ ("rax") = 1;
register int rdi __asm__ ("rdi") = fd;
register const void *rsi __asm__ ("rsi") = buf;
register size_t rdx __asm__ ("rdx") = size;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi), "r" (rsi), "r" (rdx)
: "rcx", "r11", "memory"
);
return rax;
}
void my_exit(int exit_status) {
register int64_t rax __asm__ ("rax") = 60;
register int rdi __asm__ ("rdi") = exit_status;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Compile and run:
gcc -O3 -std=c99 -ggdb3 -ffreestanding -nostdlib -Wall -Werror \
-pedantic -o main_reg.out main_reg.c
./main.out
echo $?
Output
hello world
0
For comparison, the following analogous to How to invoke a system call via syscall or sysenter in inline assembly? produces equivalent assembly:
main_constraint.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (1), "D" (fd), "S" (buf), "d" (size)
: "rcx", "r11", "memory"
);
return ret;
}
void my_exit(int exit_status) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (60), "D" (exit_status)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Disassembly of both with:
objdump -d main_reg.out
is almost identical, here is the main_reg.c one:
Disassembly of section .text:
0000000000001000 <my_write>:
1000: b8 01 00 00 00 mov $0x1,%eax
1005: 0f 05 syscall
1007: c3 retq
1008: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
100f: 00
0000000000001010 <my_exit>:
1010: b8 3c 00 00 00 mov $0x3c,%eax
1015: 0f 05 syscall
1017: c3 retq
1018: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101f: 00
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: bf 01 00 00 00 mov $0x1,%edi
102a: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: ba 0d 00 00 00 mov $0xd,%edx
1043: b8 01 00 00 00 mov $0x1,%eax
1048: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104f: 0a
1050: 0f 05 syscall
1052: 31 ff xor %edi,%edi
1054: 48 83 f8 0d cmp $0xd,%rax
1058: b8 3c 00 00 00 mov $0x3c,%eax
105d: 40 0f 95 c7 setne %dil
1061: 0f 05 syscall
1063: c3 retq
So we see that GCC inlined those tiny syscall functions as would be desired.
my_write and my_exit are the same for both, but _start in main_constraint.c is slightly different:
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102a: ba 0d 00 00 00 mov $0xd,%edx
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: b8 01 00 00 00 mov $0x1,%eax
1043: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104a: 0a
104b: 89 c7 mov %eax,%edi
104d: 0f 05 syscall
104f: 31 ff xor %edi,%edi
1051: 48 83 f8 0d cmp $0xd,%rax
1055: b8 3c 00 00 00 mov $0x3c,%eax
105a: 40 0f 95 c7 setne %dil
105e: 0f 05 syscall
1060: c3 retq
It is interesting to observe that in this case GCC found a slightly shorter equivalent encoding by picking:
104b: 89 c7 mov %eax,%edi
to set the fd to 1, which equals the 1 from the syscall number, rather than a more direct:
1025: bf 01 00 00 00 mov $0x1,%edi
For an in-depth discussion of the calling conventions, see also: What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Tested in Ubuntu 18.10, GCC 8.2.0.

Resources