How to ShellCode in linux with x64 processor. - shellcode

I am working my way through this tutorial, but it is written for 32 bit processors. I have found this note on 64 bit linux syscall. I can compile, run and objdump it.
My question is, why is the following code generated a segmentation fault.
/* shellcodetest.c */
char code[] = "\x48\x31\xc0\xb0\x3c\x48\x31\xff\x0f\x05";
int main(int argc, char **argv){
int (*func)();
func = (int (*)()) code;
(int)(*func)();
}
/*
exitt: file format elf64-x86-64
Disassembly of section .text:
0000000000400078 <_start>:
400078: 48 31 c0 xor %rax,%rax
40007b: b0 3c mov $0x3c,%al
40007d: 48 31 ff xor %rdi,%rdi
400080: 0f 05 syscall
*/
For reference syscall 60 (\x3c) is exit(). So this should just be calling exit(0);

The answer was to turn of DEP as suggested by #tylerl tylerl in the Information Security forum. #Adam-Miller explains how in the answer to
how to turn off DEP (Data Execution Prevention) without reboot? where he says, 'In linux, you can use the compiler flag "-z execstack"'

Related

Unanticipated segmentation fault in C

I'm writing a Linux shell code exploit. My target C code is:
char code[] = "\xb0\x01\x31\xdb\xcd\x80";
int main(int argc, char **argv)
{
int(*func)();
func = (int (*)()) code;
(Int)(*func)();
}
Why does compiling and running this C program raise a segmentation fault error? The string is shell code that exits the program using the system call Int 0x80/EAX=1. The original exploit code in assembly is:
b0 01 mov al,0x1
31 db xor ebx,ebx
cd 80 int 0x80
You are not setting eax=0x1, you are setting al=0x1, so if you don't know what instructions are executed before that your shellcode, you will have eax=xxxxxx01.
As the comments said you, you have to do a xor eax, eax on the beginning of your shellcode.

Shell launched from shellcode immediatly stop upon launching

I will preface by saying that i did the research but couldn't found the answer to my basic problem.
I want to launch a shell from a shellcode through a basic program.
This is the shellcode used, that should be working by now:
0: 31 c0 xor eax, eax
2: 50 push eax
3: 68 2f 2f 73 68 push 0x68732f2f
8: 68 2f 62 69 6e push 0x6e69622f
d: 89 e3 mov ebx, esp
f: 50 push eax
10: 53 push ebx
11: 89 e1 mov ecx, esp
13: b0 0b mov al, 0xb
15: 31 d2 xor edx, edx
17: cd 80 int 0x80
The weird thing is this shellcode is working when used like so:
char *shellcode = "\x31[...]x80";
int main(void)
{
(*(void(*)()) shellcode)();
return 0;
}
But not when read from stdin with this program (compiled with the following flags:
gcc vuln.c -o vuln -fno-stack-protector -m32 -z execstack):
Code:
#include [...]
typedef void (*func)(void);
int main(void)
{
char input[4096];
read(0, input, 4096);
((func)&input)();
return 0;
}
What happen with the second program is that the program simply quits with no error code and no shell spawned.
Strace shows me that in the second program, a shell is properly launched:
read(0, "1\300Ph//shh/bin\211\343PS\211\341\260\v1\322\315\200", 4096) = 26
execve("/bin//sh", ["/bin//sh"], NULL) = 0
strace: [ Process PID=3139 runs in 64 bit mode. ]
But this line near the end is highly suspicious since I'm not asked to do anything:
read(0, "", 8192) = 0
It seems that I'm sending a null byte somehow to the spawned shell and it kills it. I first though that I'm not properly setting up my shellcode file, but those are the commands I use:
perl -e 'print "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\x31\xd2\xcd\x80"' > payload.bin
./vuln < payload.bin
I also tried using this command but got the same result:
perl -e 'print "[...]" | ./vuln
I also checked the chars inside the file, but the file weights 25 oct. for a shellcode of the same size so this shouldn't be the problem.
Am i using the proper way to give a shellcode from stdin or is there another way ?
If no, where does the problem come from?
Thanks
The problem is that in the second program, you've redirected standard input to the pipe. The shell that's spawned tries to read from the pipe, gets EOF, so it exits.
You can redirect standard input back to the terminal before running the shellcode.
#include [...]
typedef void (*func)(void);
int main(void)
{
char input[4096];
read(0, input, 4096);
freopen("/dev/tty", "r", stdin);
((func)&input)();
return 0;
}

Linux how to invoke a system call via sysenter? [duplicate]

How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.
I know in x86, we can use
__asm__(
" movl $1, %eax \n"
" movl $0, %ebx \n"
" call *%gs:0x10 \n"
);
to route to sysenter indirectly.
But how can we code using sysenter/syscall directly to issue a system call?
I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
First of all, you can't safely use GNU C Basic asm(""); syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like "D"(1) means as part of an asm() statement.
You also need asm volatile because that's not implicit for Extended asm statements with 1 or more output operands.
I'm going to show you how to execute system calls by writing a program that writes Hello World! to standard output by using the write() system call. Here's the source of the program without an implementation of the actual system call :
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size);
int main(void)
{
const char hello[] = "Hello world!\n";
my_write(1, hello, sizeof(hello));
return 0;
}
You can see that I named my custom system call function as my_write in order to avoid name clashes with the "normal" write, provided by libc. The rest of this answer contains the source of my_write for i386 and amd64.
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80 in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER was never meant as a direct replacement of the int 0x80 API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10 in your code is for), which contains all the code supporting the SYSENTER instruction. There's quite a lot of it because of how the instruction actually works.
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls - some system calls like getpid and clock_gettime are so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even than sysenter could be.
It's much easier to use the slower int $0x80 to invoke the 32-bit ABI.
// i386 Linux
#include <asm/unistd.h> // compile with -m32 for 32 bit call numbers
//#define __NR_write 4
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"int $0x80"
: "=a" (ret)
: "0"(__NR_write), "b"(fd), "c"(buf), "d"(size)
: "memory" // the kernel dereferences pointer args
);
return ret;
}
As you can see, using the int 0x80 API is relatively simple. The number of the syscall goes to the eax register, while all the parameters needed for the syscall go into respectively ebx, ecx, edx, esi, edi, and ebp. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h.
Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2).
The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Keep in mind that the clobber list also contains the memory parameter, which means that the instruction listed in the instruction list references memory (via the buf parameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
amd64
Things look different on the AMD64 architecture which sports a new instruction called SYSCALL. It is very different from the original SYSENTER instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL, actually, and adapting the old int 0x80 to the new SYSCALL is pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).
In this case, the number of the system call is still passed in the register rax, but the registers used to hold the arguments now nearly match the function calling convention: rdi, rsi, rdx, r10, r8 and r9 in that order. (syscall itself destroys rcx so r10 is used instead of rcx, letting libc wrapper functions just use mov r10, rcx / syscall.)
// x86-64 Linux
#include <asm/unistd.h> // compile without -m32 for 64 bit call numbers
// #define __NR_write 1
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"syscall"
: "=a" (ret)
// EDI RSI RDX
: "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
: "rcx", "r11", "memory"
);
return ret;
}
(See it compile on Godbolt)
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
The "0"(callnum) matching constraint could be written as "a" because operand 0 (the "=a"(ret) output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.
Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.
Explicit register variables
https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Explicit-Register-Variables.html#Explicit-Reg-Vars)
I believe this should now generally be the recommended approach over register constraints because:
it can represent all registers, including r8, r9 and r10 which are used for system call arguments: How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?
it's the only optimal option for other ISAs besides x86 like ARM, which don't have the magic register constraint names: How to specify an individual register as constraint in ARM GCC inline assembly? (besides using a temporary register + clobbers + and an extra mov instruction)
I'll argue that this syntax is more readable than using the single letter mnemonics such as S -> rsi
Register variables are used for example in glibc 2.29, see: sysdeps/unix/sysv/linux/x86_64/sysdep.h.
main_reg.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
register int64_t rax __asm__ ("rax") = 1;
register int rdi __asm__ ("rdi") = fd;
register const void *rsi __asm__ ("rsi") = buf;
register size_t rdx __asm__ ("rdx") = size;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi), "r" (rsi), "r" (rdx)
: "rcx", "r11", "memory"
);
return rax;
}
void my_exit(int exit_status) {
register int64_t rax __asm__ ("rax") = 60;
register int rdi __asm__ ("rdi") = exit_status;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Compile and run:
gcc -O3 -std=c99 -ggdb3 -ffreestanding -nostdlib -Wall -Werror \
-pedantic -o main_reg.out main_reg.c
./main.out
echo $?
Output
hello world
0
For comparison, the following analogous to How to invoke a system call via syscall or sysenter in inline assembly? produces equivalent assembly:
main_constraint.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (1), "D" (fd), "S" (buf), "d" (size)
: "rcx", "r11", "memory"
);
return ret;
}
void my_exit(int exit_status) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (60), "D" (exit_status)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Disassembly of both with:
objdump -d main_reg.out
is almost identical, here is the main_reg.c one:
Disassembly of section .text:
0000000000001000 <my_write>:
1000: b8 01 00 00 00 mov $0x1,%eax
1005: 0f 05 syscall
1007: c3 retq
1008: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
100f: 00
0000000000001010 <my_exit>:
1010: b8 3c 00 00 00 mov $0x3c,%eax
1015: 0f 05 syscall
1017: c3 retq
1018: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101f: 00
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: bf 01 00 00 00 mov $0x1,%edi
102a: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: ba 0d 00 00 00 mov $0xd,%edx
1043: b8 01 00 00 00 mov $0x1,%eax
1048: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104f: 0a
1050: 0f 05 syscall
1052: 31 ff xor %edi,%edi
1054: 48 83 f8 0d cmp $0xd,%rax
1058: b8 3c 00 00 00 mov $0x3c,%eax
105d: 40 0f 95 c7 setne %dil
1061: 0f 05 syscall
1063: c3 retq
So we see that GCC inlined those tiny syscall functions as would be desired.
my_write and my_exit are the same for both, but _start in main_constraint.c is slightly different:
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102a: ba 0d 00 00 00 mov $0xd,%edx
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: b8 01 00 00 00 mov $0x1,%eax
1043: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104a: 0a
104b: 89 c7 mov %eax,%edi
104d: 0f 05 syscall
104f: 31 ff xor %edi,%edi
1051: 48 83 f8 0d cmp $0xd,%rax
1055: b8 3c 00 00 00 mov $0x3c,%eax
105a: 40 0f 95 c7 setne %dil
105e: 0f 05 syscall
1060: c3 retq
It is interesting to observe that in this case GCC found a slightly shorter equivalent encoding by picking:
104b: 89 c7 mov %eax,%edi
to set the fd to 1, which equals the 1 from the syscall number, rather than a more direct:
1025: bf 01 00 00 00 mov $0x1,%edi
For an in-depth discussion of the calling conventions, see also: What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Tested in Ubuntu 18.10, GCC 8.2.0.

memory access when writing a linux kernel module in assembler

i try writing a kernel module in assembler. In one time i needed a global vars. I define a dword in .data (or .bss) section, and in init function i try add 1 to var. My program seccesfully make, but insmod sey me:
$ sudo insmod ./test.ko
insmod: ERROR: could not insert module ./test.ko: Invalid module format
it's my assembler code in nasm:
[bits 64]
global init
global cleanup
extern printk
section .data
init_mess db "Hello!", 10, 0
g_var dd 0
section .text
init:
push rbp
mov rbp, rsp
inc dword [g_var]
mov rdi, init_mess
xor rax, rax
call printk
xor rax, rax
mov rsp, rbp
pop rbp
ret
cleanup:
xor rax, rax
ret
if i write adding in C code, all work good:
static i = 0;
static int __init main_init(void) { i++; return init(); }
But in this objdump -d test.ko write a very stainght code for me:
0000000000000000 <init_module>:
0: 55 push %rbp
1: ff 05 00 00 00 00 incl 0x0(%rip) # 7 <init_module+0x7>
7: 48 89 e5 mov %rsp,%rbp
a: e8 00 00 00 00 callq f <init_module+0xf>
f: 5d pop %rbp
10: c3 retq
What does this mean (incl 0x0(%rip))? How can I access memory? Please, help me :)
(My system is archlinux x86_64)
my C code for correct make a module:
#include <linux/module.h>
#include <linux/init.h>
MODULE_AUTHOR("Actics");
MODULE_DESCRIPTION("Description");
MODULE_LICENSE("GPL");
extern int init(void);
extern int cleanup(void);
static int __init main_init(void) { return init(); }
static void __exit main_cleanup(void) { cleanup(); }
module_init(main_init);
module_exit(main_cleanup);
and my Makefile:
obj-m := test.o
test-objs := inthan.o module.o
KVERSION = $(shell uname -r)
inthan.o: inthan.asm
nasm -f elf64 -o $# $^
build:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules
Kernel mode lives in the "negative" (ie. top) part of the address space, where 32 bit absolute addresses can not be used (because they are not sign-extended). As you have noticed, gcc uses rip-relative addresses to work around this problem which gives offsets from the current instruction pointer. You can make nasm do the same by using the DEFAULT REL directive. See the relevant section in the nasm documentation.
you can always use inline assembly
asm("add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0" \
54 : "=&r" (flag), "=r" (roksum) \
55 : "1" (addr), "g" ((long)(size)), \
56 "rm" (limit));

How to invoke a system call via syscall or sysenter in inline assembly?

How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.
I know in x86, we can use
__asm__(
" movl $1, %eax \n"
" movl $0, %ebx \n"
" call *%gs:0x10 \n"
);
to route to sysenter indirectly.
But how can we code using sysenter/syscall directly to issue a system call?
I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
First of all, you can't safely use GNU C Basic asm(""); syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like "D"(1) means as part of an asm() statement.
You also need asm volatile because that's not implicit for Extended asm statements with 1 or more output operands.
I'm going to show you how to execute system calls by writing a program that writes Hello World! to standard output by using the write() system call. Here's the source of the program without an implementation of the actual system call :
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size);
int main(void)
{
const char hello[] = "Hello world!\n";
my_write(1, hello, sizeof(hello));
return 0;
}
You can see that I named my custom system call function as my_write in order to avoid name clashes with the "normal" write, provided by libc. The rest of this answer contains the source of my_write for i386 and amd64.
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80 in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER was never meant as a direct replacement of the int 0x80 API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10 in your code is for), which contains all the code supporting the SYSENTER instruction. There's quite a lot of it because of how the instruction actually works.
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls - some system calls like getpid and clock_gettime are so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even than sysenter could be.
It's much easier to use the slower int $0x80 to invoke the 32-bit ABI.
// i386 Linux
#include <asm/unistd.h> // compile with -m32 for 32 bit call numbers
//#define __NR_write 4
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"int $0x80"
: "=a" (ret)
: "0"(__NR_write), "b"(fd), "c"(buf), "d"(size)
: "memory" // the kernel dereferences pointer args
);
return ret;
}
As you can see, using the int 0x80 API is relatively simple. The number of the syscall goes to the eax register, while all the parameters needed for the syscall go into respectively ebx, ecx, edx, esi, edi, and ebp. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h.
Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2).
The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Keep in mind that the clobber list also contains the memory parameter, which means that the instruction listed in the instruction list references memory (via the buf parameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
amd64
Things look different on the AMD64 architecture which sports a new instruction called SYSCALL. It is very different from the original SYSENTER instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL, actually, and adapting the old int 0x80 to the new SYSCALL is pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).
In this case, the number of the system call is still passed in the register rax, but the registers used to hold the arguments now nearly match the function calling convention: rdi, rsi, rdx, r10, r8 and r9 in that order. (syscall itself destroys rcx so r10 is used instead of rcx, letting libc wrapper functions just use mov r10, rcx / syscall.)
// x86-64 Linux
#include <asm/unistd.h> // compile without -m32 for 64 bit call numbers
// #define __NR_write 1
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"syscall"
: "=a" (ret)
// EDI RSI RDX
: "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
: "rcx", "r11", "memory"
);
return ret;
}
(See it compile on Godbolt)
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
The "0"(callnum) matching constraint could be written as "a" because operand 0 (the "=a"(ret) output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.
Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.
Explicit register variables
https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Explicit-Register-Variables.html#Explicit-Reg-Vars)
I believe this should now generally be the recommended approach over register constraints because:
it can represent all registers, including r8, r9 and r10 which are used for system call arguments: How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?
it's the only optimal option for other ISAs besides x86 like ARM, which don't have the magic register constraint names: How to specify an individual register as constraint in ARM GCC inline assembly? (besides using a temporary register + clobbers + and an extra mov instruction)
I'll argue that this syntax is more readable than using the single letter mnemonics such as S -> rsi
Register variables are used for example in glibc 2.29, see: sysdeps/unix/sysv/linux/x86_64/sysdep.h.
main_reg.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
register int64_t rax __asm__ ("rax") = 1;
register int rdi __asm__ ("rdi") = fd;
register const void *rsi __asm__ ("rsi") = buf;
register size_t rdx __asm__ ("rdx") = size;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi), "r" (rsi), "r" (rdx)
: "rcx", "r11", "memory"
);
return rax;
}
void my_exit(int exit_status) {
register int64_t rax __asm__ ("rax") = 60;
register int rdi __asm__ ("rdi") = exit_status;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Compile and run:
gcc -O3 -std=c99 -ggdb3 -ffreestanding -nostdlib -Wall -Werror \
-pedantic -o main_reg.out main_reg.c
./main.out
echo $?
Output
hello world
0
For comparison, the following analogous to How to invoke a system call via syscall or sysenter in inline assembly? produces equivalent assembly:
main_constraint.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (1), "D" (fd), "S" (buf), "d" (size)
: "rcx", "r11", "memory"
);
return ret;
}
void my_exit(int exit_status) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (60), "D" (exit_status)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Disassembly of both with:
objdump -d main_reg.out
is almost identical, here is the main_reg.c one:
Disassembly of section .text:
0000000000001000 <my_write>:
1000: b8 01 00 00 00 mov $0x1,%eax
1005: 0f 05 syscall
1007: c3 retq
1008: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
100f: 00
0000000000001010 <my_exit>:
1010: b8 3c 00 00 00 mov $0x3c,%eax
1015: 0f 05 syscall
1017: c3 retq
1018: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101f: 00
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: bf 01 00 00 00 mov $0x1,%edi
102a: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: ba 0d 00 00 00 mov $0xd,%edx
1043: b8 01 00 00 00 mov $0x1,%eax
1048: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104f: 0a
1050: 0f 05 syscall
1052: 31 ff xor %edi,%edi
1054: 48 83 f8 0d cmp $0xd,%rax
1058: b8 3c 00 00 00 mov $0x3c,%eax
105d: 40 0f 95 c7 setne %dil
1061: 0f 05 syscall
1063: c3 retq
So we see that GCC inlined those tiny syscall functions as would be desired.
my_write and my_exit are the same for both, but _start in main_constraint.c is slightly different:
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102a: ba 0d 00 00 00 mov $0xd,%edx
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: b8 01 00 00 00 mov $0x1,%eax
1043: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104a: 0a
104b: 89 c7 mov %eax,%edi
104d: 0f 05 syscall
104f: 31 ff xor %edi,%edi
1051: 48 83 f8 0d cmp $0xd,%rax
1055: b8 3c 00 00 00 mov $0x3c,%eax
105a: 40 0f 95 c7 setne %dil
105e: 0f 05 syscall
1060: c3 retq
It is interesting to observe that in this case GCC found a slightly shorter equivalent encoding by picking:
104b: 89 c7 mov %eax,%edi
to set the fd to 1, which equals the 1 from the syscall number, rather than a more direct:
1025: bf 01 00 00 00 mov $0x1,%edi
For an in-depth discussion of the calling conventions, see also: What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Tested in Ubuntu 18.10, GCC 8.2.0.

Resources