How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.
I know in x86, we can use
__asm__(
" movl $1, %eax \n"
" movl $0, %ebx \n"
" call *%gs:0x10 \n"
);
to route to sysenter indirectly.
But how can we code using sysenter/syscall directly to issue a system call?
I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
First of all, you can't safely use GNU C Basic asm(""); syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like "D"(1) means as part of an asm() statement.
You also need asm volatile because that's not implicit for Extended asm statements with 1 or more output operands.
I'm going to show you how to execute system calls by writing a program that writes Hello World! to standard output by using the write() system call. Here's the source of the program without an implementation of the actual system call :
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size);
int main(void)
{
const char hello[] = "Hello world!\n";
my_write(1, hello, sizeof(hello));
return 0;
}
You can see that I named my custom system call function as my_write in order to avoid name clashes with the "normal" write, provided by libc. The rest of this answer contains the source of my_write for i386 and amd64.
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80 in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER was never meant as a direct replacement of the int 0x80 API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10 in your code is for), which contains all the code supporting the SYSENTER instruction. There's quite a lot of it because of how the instruction actually works.
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls - some system calls like getpid and clock_gettime are so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even than sysenter could be.
It's much easier to use the slower int $0x80 to invoke the 32-bit ABI.
// i386 Linux
#include <asm/unistd.h> // compile with -m32 for 32 bit call numbers
//#define __NR_write 4
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"int $0x80"
: "=a" (ret)
: "0"(__NR_write), "b"(fd), "c"(buf), "d"(size)
: "memory" // the kernel dereferences pointer args
);
return ret;
}
As you can see, using the int 0x80 API is relatively simple. The number of the syscall goes to the eax register, while all the parameters needed for the syscall go into respectively ebx, ecx, edx, esi, edi, and ebp. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h.
Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2).
The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Keep in mind that the clobber list also contains the memory parameter, which means that the instruction listed in the instruction list references memory (via the buf parameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
amd64
Things look different on the AMD64 architecture which sports a new instruction called SYSCALL. It is very different from the original SYSENTER instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL, actually, and adapting the old int 0x80 to the new SYSCALL is pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).
In this case, the number of the system call is still passed in the register rax, but the registers used to hold the arguments now nearly match the function calling convention: rdi, rsi, rdx, r10, r8 and r9 in that order. (syscall itself destroys rcx so r10 is used instead of rcx, letting libc wrapper functions just use mov r10, rcx / syscall.)
// x86-64 Linux
#include <asm/unistd.h> // compile without -m32 for 64 bit call numbers
// #define __NR_write 1
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"syscall"
: "=a" (ret)
// EDI RSI RDX
: "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
: "rcx", "r11", "memory"
);
return ret;
}
(See it compile on Godbolt)
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
The "0"(callnum) matching constraint could be written as "a" because operand 0 (the "=a"(ret) output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.
Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.
Explicit register variables
https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Explicit-Register-Variables.html#Explicit-Reg-Vars)
I believe this should now generally be the recommended approach over register constraints because:
it can represent all registers, including r8, r9 and r10 which are used for system call arguments: How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?
it's the only optimal option for other ISAs besides x86 like ARM, which don't have the magic register constraint names: How to specify an individual register as constraint in ARM GCC inline assembly? (besides using a temporary register + clobbers + and an extra mov instruction)
I'll argue that this syntax is more readable than using the single letter mnemonics such as S -> rsi
Register variables are used for example in glibc 2.29, see: sysdeps/unix/sysv/linux/x86_64/sysdep.h.
main_reg.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
register int64_t rax __asm__ ("rax") = 1;
register int rdi __asm__ ("rdi") = fd;
register const void *rsi __asm__ ("rsi") = buf;
register size_t rdx __asm__ ("rdx") = size;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi), "r" (rsi), "r" (rdx)
: "rcx", "r11", "memory"
);
return rax;
}
void my_exit(int exit_status) {
register int64_t rax __asm__ ("rax") = 60;
register int rdi __asm__ ("rdi") = exit_status;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Compile and run:
gcc -O3 -std=c99 -ggdb3 -ffreestanding -nostdlib -Wall -Werror \
-pedantic -o main_reg.out main_reg.c
./main.out
echo $?
Output
hello world
0
For comparison, the following analogous to How to invoke a system call via syscall or sysenter in inline assembly? produces equivalent assembly:
main_constraint.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (1), "D" (fd), "S" (buf), "d" (size)
: "rcx", "r11", "memory"
);
return ret;
}
void my_exit(int exit_status) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (60), "D" (exit_status)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Disassembly of both with:
objdump -d main_reg.out
is almost identical, here is the main_reg.c one:
Disassembly of section .text:
0000000000001000 <my_write>:
1000: b8 01 00 00 00 mov $0x1,%eax
1005: 0f 05 syscall
1007: c3 retq
1008: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
100f: 00
0000000000001010 <my_exit>:
1010: b8 3c 00 00 00 mov $0x3c,%eax
1015: 0f 05 syscall
1017: c3 retq
1018: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101f: 00
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: bf 01 00 00 00 mov $0x1,%edi
102a: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: ba 0d 00 00 00 mov $0xd,%edx
1043: b8 01 00 00 00 mov $0x1,%eax
1048: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104f: 0a
1050: 0f 05 syscall
1052: 31 ff xor %edi,%edi
1054: 48 83 f8 0d cmp $0xd,%rax
1058: b8 3c 00 00 00 mov $0x3c,%eax
105d: 40 0f 95 c7 setne %dil
1061: 0f 05 syscall
1063: c3 retq
So we see that GCC inlined those tiny syscall functions as would be desired.
my_write and my_exit are the same for both, but _start in main_constraint.c is slightly different:
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102a: ba 0d 00 00 00 mov $0xd,%edx
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: b8 01 00 00 00 mov $0x1,%eax
1043: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104a: 0a
104b: 89 c7 mov %eax,%edi
104d: 0f 05 syscall
104f: 31 ff xor %edi,%edi
1051: 48 83 f8 0d cmp $0xd,%rax
1055: b8 3c 00 00 00 mov $0x3c,%eax
105a: 40 0f 95 c7 setne %dil
105e: 0f 05 syscall
1060: c3 retq
It is interesting to observe that in this case GCC found a slightly shorter equivalent encoding by picking:
104b: 89 c7 mov %eax,%edi
to set the fd to 1, which equals the 1 from the syscall number, rather than a more direct:
1025: bf 01 00 00 00 mov $0x1,%edi
For an in-depth discussion of the calling conventions, see also: What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Tested in Ubuntu 18.10, GCC 8.2.0.
Related
I am currently practicing with assembly reading by disassemblying C programs and trying to understand what they do.
I am stuck with a trivial one: a simple hello world program.
#include <stdio.h>
#include <stdlib.h>
int main() {
printf("Hello, world!");
return(0);
}
When I disassemble the main:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400526 <+0>: push rbp
0x0000000000400527 <+1>: mov rbp,rsp
0x000000000040052a <+4>: mov edi,0x4005c4
0x000000000040052f <+9>: mov eax,0x0
0x0000000000400534 <+14>: call 0x400400 <printf#plt>
0x0000000000400539 <+19>: mov eax,0x0
0x000000000040053e <+24>: pop rbp
0x000000000040053f <+25>: ret
I understand the first two lines: the base pointer is saved on the stack (by push rbp, which causes the value of the stack pointer to be decreased by 8, because it has "grown") and the value of the stack pointer is saved in the base pointer (so that parameters and local variable can be easily reached through positive and negative offsets, respectively, while the stack can keep "growing").
The third line presents the first issue: why is 0x4005c4 (the address of the "Hello, World!" string) moved in the edi register instead of moving it on the stack? Shouldn't the printf function take the address of that string as parameter? For what I know, functions take parameters from the stack (but here, it looks like the parameter is put in that register: edi)
On another post here on StackOverflow I read that "printf#ptl" is like a stub function that calls the real printf function. I tried to disassemble that function, but it gets even more confusing:
(gdb) disassemble printf
Dump of assembler code for function __printf:
0x00007ffff7a637b0 <+0>: sub rsp,0xd8
0x00007ffff7a637b7 <+7>: test al,al
0x00007ffff7a637b9 <+9>: mov QWORD PTR [rsp+0x28],rsi
0x00007ffff7a637be <+14>: mov QWORD PTR [rsp+0x30],rdx
0x00007ffff7a637c3 <+19>: mov QWORD PTR [rsp+0x38],rcx
0x00007ffff7a637c8 <+24>: mov QWORD PTR [rsp+0x40],r8
0x00007ffff7a637cd <+29>: mov QWORD PTR [rsp+0x48],r9
0x00007ffff7a637d2 <+34>: je 0x7ffff7a6380b <__printf+91>
0x00007ffff7a637d4 <+36>: movaps XMMWORD PTR [rsp+0x50],xmm0
0x00007ffff7a637d9 <+41>: movaps XMMWORD PTR [rsp+0x60],xmm1
0x00007ffff7a637de <+46>: movaps XMMWORD PTR [rsp+0x70],xmm2
0x00007ffff7a637e3 <+51>: movaps XMMWORD PTR [rsp+0x80],xmm3
0x00007ffff7a637eb <+59>: movaps XMMWORD PTR [rsp+0x90],xmm4
0x00007ffff7a637f3 <+67>: movaps XMMWORD PTR [rsp+0xa0],xmm5
0x00007ffff7a637fb <+75>: movaps XMMWORD PTR [rsp+0xb0],xmm6
0x00007ffff7a63803 <+83>: movaps XMMWORD PTR [rsp+0xc0],xmm7
0x00007ffff7a6380b <+91>: lea rax,[rsp+0xe0]
0x00007ffff7a63813 <+99>: mov rsi,rdi
0x00007ffff7a63816 <+102>: lea rdx,[rsp+0x8]
0x00007ffff7a6381b <+107>: mov QWORD PTR [rsp+0x10],rax
0x00007ffff7a63820 <+112>: lea rax,[rsp+0x20]
0x00007ffff7a63825 <+117>: mov DWORD PTR [rsp+0x8],0x8
0x00007ffff7a6382d <+125>: mov DWORD PTR [rsp+0xc],0x30
0x00007ffff7a63835 <+133>: mov QWORD PTR [rsp+0x18],rax
0x00007ffff7a6383a <+138>: mov rax,QWORD PTR [rip+0x36d70f] # 0x7ffff7dd0f50
0x00007ffff7a63841 <+145>: mov rdi,QWORD PTR [rax]
0x00007ffff7a63844 <+148>: call 0x7ffff7a5b130 <_IO_vfprintf_internal>
0x00007ffff7a63849 <+153>: add rsp,0xd8
0x00007ffff7a63850 <+160>: ret
End of assembler dump.
The two mov operations on eax (mov eax, 0x0) bother me a little as well, since I don't get they role in here (but I am more concerned with what I have just described).
Thank you in advance.
gcc is targeting the x86-64 System V ABI, used by all x86-64 systems other than Windows (for various historical reasons). Its calling convention passes the first few args in registers before falling back to the stack. (See also the Wikipedia basic summary of this calling convention.)
And yes, this is different from the crusty old 32-bit calling conventions that use the stack for everything. This is a Good Thing. See also the x86 tag wiki for more links to ABI docs, and tons of other stuff.
0x0000000000400526: push rbp
0x0000000000400527: mov rbp,rsp # stack-frame boilerplate
0x000000000040052a: mov edi,0x4005c4 # first arg
0x000000000040052f: mov eax,0x0 # 0 FP args in vector registers
0x0000000000400534: call 0x400400 <printf#plt>
0x0000000000400539: mov eax,0x0 # return 0. If you'd compiled with optimization, this and the previous mov would be xor eax,eax
0x000000000040053e: pop rbp # clean up stack frame
0x000000000040053f: ret
Pointers to static data fit into 32 bits, which is why it can use mov edi, imm32 instead of movabs rdi, imm64.
Floating-point args are passed in SSE registers (xmm0-xmm7), even to var-args functions. al indicates how many FP args are in vector registers. (Note that C's type promotion rules mean that float args to variadic functions are always promoted to double, which is why printf doesn't have any format specifiers for float, only double and long double).
printf#ptl is like a stub function that calls the real printf function.
Yes, that's right. The Procedure Linking Table entry starts out as a jmp to a dynamic linker routine that resolves the symbol and modifies the code in the PLT to turn it into a jmp directly to the address where libc's printf definition is mapped. printf is a weak alias for __printf, which is why gdb chooses the __printf label for that address, after you asked for disassembly of printf.
Dump of assembler code for function __printf:
0x00007ffff7a637b0 <+0>: sub rsp,0xd8 # reserve space
0x00007ffff7a637b7 <+7>: test al,al # check if there were any FP args
0x00007ffff7a637b9 <+9>: mov QWORD PTR [rsp+0x28],rsi # store the integer arg-passing registers to local scratch space
0x00007ffff7a637be <+14>: mov QWORD PTR [rsp+0x30],rdx
0x00007ffff7a637c3 <+19>: mov QWORD PTR [rsp+0x38],rcx
0x00007ffff7a637c8 <+24>: mov QWORD PTR [rsp+0x40],r8
0x00007ffff7a637cd <+29>: mov QWORD PTR [rsp+0x48],r9
0x00007ffff7a637d2 <+34>: je 0x7ffff7a6380b <__printf+91> # skip storing the FP arg-passing regs if there were no FP args
0x00007ffff7a637d4 <+36>: movaps XMMWORD PTR [rsp+0x50],xmm0
0x00007ffff7a637d9 <+41>: movaps XMMWORD PTR [rsp+0x60],xmm1
0x00007ffff7a637de <+46>: movaps XMMWORD PTR [rsp+0x70],xmm2
0x00007ffff7a637e3 <+51>: movaps XMMWORD PTR [rsp+0x80],xmm3
0x00007ffff7a637eb <+59>: movaps XMMWORD PTR [rsp+0x90],xmm4
0x00007ffff7a637f3 <+67>: movaps XMMWORD PTR [rsp+0xa0],xmm5
0x00007ffff7a637fb <+75>: movaps XMMWORD PTR [rsp+0xb0],xmm6
0x00007ffff7a63803 <+83>: movaps XMMWORD PTR [rsp+0xc0],xmm7
branch_target_from_test_je:
0x00007ffff7a6380b <+91>: lea rax,[rsp+0xe0] # some more stuff
So printf's implementation keeps the var-args handling simple by storing all the arg-passing registers (except the first one holding the format string) in order to local arrays. It can walk a pointer through them instead of needing switch-like code to extract the right integer or FP arg. It still needs to keep track of the first 5 integer and first 8 FP args, because they aren't contiguous with the rest of the args pushed by the caller onto the stack.
The Windows 64-bit calling convention's shadow space simplifies this by providing space for a function to dump its register args to the stack contiguous with the args already on the stack, but that's not worth wasting 32 bytes of stack on every call, IMO. (See my answer and comments on other answers on Why does Windows64 use a different calling convention from all other OSes on x86-64?)
there is nothing trivial about printf, not the first choice for what you are trying to do but, turned out to be not overly complicated.
Something simpler:
extern unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
return(more_fun(x)+7);
}
0000000000000000 <fun>:
0: 48 83 ec 08 sub $0x8,%rsp
4: e8 00 00 00 00 callq 9 <fun+0x9>
9: 48 83 c4 08 add $0x8,%rsp
d: 83 c0 07 add $0x7,%eax
10: c3 retq
and the stack is used. eax used for the return.
now use a pointer
extern unsigned int more_fun ( unsigned int * );
unsigned int fun ( unsigned int x )
{
return(more_fun(&x)+7);
}
0000000000000000 <fun>:
0: 48 83 ec 18 sub $0x18,%rsp
4: 89 7c 24 0c mov %edi,0xc(%rsp)
8: 48 8d 7c 24 0c lea 0xc(%rsp),%rdi
d: e8 00 00 00 00 callq 12 <fun+0x12>
12: 48 83 c4 18 add $0x18,%rsp
16: 83 c0 07 add $0x7,%eax
19: c3 retq
and there you go edi used as in your case.
two pointers
extern unsigned int more_fun ( unsigned int *, unsigned int * );
unsigned int fun ( unsigned int x, unsigned int y )
{
return(more_fun(&x,&y)+7);
}
0000000000000000 <fun>:
0: 48 83 ec 18 sub $0x18,%rsp
4: 89 7c 24 0c mov %edi,0xc(%rsp)
8: 89 74 24 08 mov %esi,0x8(%rsp)
c: 48 8d 7c 24 0c lea 0xc(%rsp),%rdi
11: 48 8d 74 24 08 lea 0x8(%rsp),%rsi
16: e8 00 00 00 00 callq 1b <fun+0x1b>
1b: 48 83 c4 18 add $0x18,%rsp
1f: 83 c0 07 add $0x7,%eax
22: c3 retq
now edi and esi are used. all looking like it is the calling convention to me...
a string
extern unsigned int more_fun ( const char * );
unsigned int fun ( void )
{
return(more_fun("Hello World")+7);
}
0000000000000000 <fun>:
0: 48 83 ec 08 sub $0x8,%rsp
4: bf 00 00 00 00 mov $0x0,%edi
9: e8 00 00 00 00 callq e <fun+0xe>
e: 48 83 c4 08 add $0x8,%rsp
12: 83 c0 07 add $0x7,%eax
15: c3 retq
eax is not prepped as in printf, so perhaps eax has something to do with the number of parameters that follow, try putting more parameters on your printf and see if eax going in changes.
if I add -m32 on my command line then edi is not used.
00000000 <fun>:
0: 83 ec 18 sub $0x18,%esp
3: 68 00 00 00 00 push $0x0
8: e8 fc ff ff ff call 9 <fun+0x9>
d: 83 c4 1c add $0x1c,%esp
10: 83 c0 07 add $0x7,%eax
13: c3
I suspect the push is a placeholder for the linker to push the address to the string when the linker patches up the binary, this was just an object. So my guess is when you have a 64 bit pointer, the first one or two go into registers then the stack is used after it runs out of registers.
Obviously the compiler works so this is conforming to the compilers calling convention.
extern unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
return(more_fun(x+5)+7);
}
0000000000000000 <fun>:
0: 48 83 ec 08 sub $0x8,%rsp
4: 83 c7 05 add $0x5,%edi
7: e8 00 00 00 00 callq c <fun+0xc>
c: 48 83 c4 08 add $0x8,%rsp
10: 83 c0 07 add $0x7,%eax
13: c3 retq
correction based on Peter's comment. Yeah it does appear that registers are being used here.
And since he mentioned 6 parameters, lets try 7.
extern unsigned int more_fun
(
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int
);
unsigned int fun (
unsigned int a,
unsigned int b,
unsigned int c,
unsigned int d,
unsigned int e,
unsigned int f,
unsigned int g
)
{
return(more_fun(a+1,b+2,c+3,d+4,e+5,f+6,g+7)+17);
}
0000000000000000 <fun>:
0: 48 83 ec 10 sub $0x10,%rsp
4: 83 c1 04 add $0x4,%ecx
7: 83 c2 03 add $0x3,%edx
a: 8b 44 24 18 mov 0x18(%rsp),%eax
e: 83 c6 02 add $0x2,%esi
11: 83 c7 01 add $0x1,%edi
14: 41 83 c1 06 add $0x6,%r9d
18: 41 83 c0 05 add $0x5,%r8d
1c: 83 c0 07 add $0x7,%eax
1f: 50 push %rax
20: e8 00 00 00 00 callq 25 <fun+0x25>
25: 48 83 c4 18 add $0x18,%rsp
29: 83 c0 11 add $0x11,%eax
2c: c3 retq
and sure enough that 7th parameter was pulled from the stack modified and put back on the stack before the call. The other 6 in registers.
I am working my way through this tutorial, but it is written for 32 bit processors. I have found this note on 64 bit linux syscall. I can compile, run and objdump it.
My question is, why is the following code generated a segmentation fault.
/* shellcodetest.c */
char code[] = "\x48\x31\xc0\xb0\x3c\x48\x31\xff\x0f\x05";
int main(int argc, char **argv){
int (*func)();
func = (int (*)()) code;
(int)(*func)();
}
/*
exitt: file format elf64-x86-64
Disassembly of section .text:
0000000000400078 <_start>:
400078: 48 31 c0 xor %rax,%rax
40007b: b0 3c mov $0x3c,%al
40007d: 48 31 ff xor %rdi,%rdi
400080: 0f 05 syscall
*/
For reference syscall 60 (\x3c) is exit(). So this should just be calling exit(0);
The answer was to turn of DEP as suggested by #tylerl tylerl in the Information Security forum. #Adam-Miller explains how in the answer to
how to turn off DEP (Data Execution Prevention) without reboot? where he says, 'In linux, you can use the compiler flag "-z execstack"'
I have a small example program written in NASM(2.11.08) targeting the macho64 architecture. I'm running OSX 10.10.3:
bits 64
section .data
msg1 db 'Message One', 10, 0
msg1len equ $-msg1
msg2 db 'Message Two', 10, 0
msg2len equ $-msg2
section .text
global _main
extern _printf
_main:
sub rsp, 8 ; align
lea rdi, [rel msg1]
xor rax, rax
call _printf
lea rdi, [rel msg2]
xor rax, rax
call _printf
add rsp, 8
ret
I'm compiling and linking using the following command line:
/usr/local/bin/nasm -f macho64 test2.s
ld -macosx_version_min 10.10.0 -lSystem -o test2 test2.o
When I do an object dump on the test2 executable, this is the relevant snippet(I can post more if I'm wrong!):
0000000000001fb7 <_main>:
1fb7: 48 83 ec 08 sub $0x8,%rsp
1fbb: 48 8d 3d 56 01 00 00 lea 0x156(%rip),%rdi # 2118 <msg2+0xf3>
1fc2: 48 31 c0 xor %rax,%rax
1fc5: e8 14 00 00 00 callq 1fde <_printf$stub>
1fca: 48 8d 3d 54 00 00 00 lea 0x54(%rip),%rdi # 2025 <msg2>
1fd1: 48 31 c0 xor %rax,%rax
1fd4: e8 05 00 00 00 callq 1fde <_printf$stub>
1fd9: 48 83 c4 08 add $0x8,%rsp
1fdd: c3 retq
...
0000000000002018 <msg1>:
0000000000002025 <msg2>:
And, finally, the output:
$ ./test2
Message Two
$
My question is, what happened to msg1?
I'm assuming msg1 isn't printed because 0x14f(%rip) is not the correct address (just nulls).
Why is lea edi, [rel msg2] pointing to the correct address, while lea edi, [rel msg1] is pointing past msg2, into NULLs?
It looks like the 0x14f(%rip) offset is exactly 0x100 beyond where msg1 lies in memory (this is true throughout many tests of this problem).
What am I missing here?
Edit: Whichever message (msg1 or msg2) appears last in the .data section is the only message that gets printed.
IDK about the Mach-o ABI, but if it's the same as the SystemV x86-64 ABI GNU/Linux uses, then I think your problem is that you need to clear eax to tell a varargs function like printf that there are zero FP.
Also, lea rdi, [rel msg1] would be a much better choice. As it stands, your code is only position-independent within the low 32bits of virtual address space, because you're truncating the pointers to 32bits.
It appears NASM has a bug. This same problem came up again: NASM 2 lines of db (initialized data) seemingly not working. There, the OP confirmed that the data was present, but labels were wrong, and is hopefully reporting it upstream.
i try writing a kernel module in assembler. In one time i needed a global vars. I define a dword in .data (or .bss) section, and in init function i try add 1 to var. My program seccesfully make, but insmod sey me:
$ sudo insmod ./test.ko
insmod: ERROR: could not insert module ./test.ko: Invalid module format
it's my assembler code in nasm:
[bits 64]
global init
global cleanup
extern printk
section .data
init_mess db "Hello!", 10, 0
g_var dd 0
section .text
init:
push rbp
mov rbp, rsp
inc dword [g_var]
mov rdi, init_mess
xor rax, rax
call printk
xor rax, rax
mov rsp, rbp
pop rbp
ret
cleanup:
xor rax, rax
ret
if i write adding in C code, all work good:
static i = 0;
static int __init main_init(void) { i++; return init(); }
But in this objdump -d test.ko write a very stainght code for me:
0000000000000000 <init_module>:
0: 55 push %rbp
1: ff 05 00 00 00 00 incl 0x0(%rip) # 7 <init_module+0x7>
7: 48 89 e5 mov %rsp,%rbp
a: e8 00 00 00 00 callq f <init_module+0xf>
f: 5d pop %rbp
10: c3 retq
What does this mean (incl 0x0(%rip))? How can I access memory? Please, help me :)
(My system is archlinux x86_64)
my C code for correct make a module:
#include <linux/module.h>
#include <linux/init.h>
MODULE_AUTHOR("Actics");
MODULE_DESCRIPTION("Description");
MODULE_LICENSE("GPL");
extern int init(void);
extern int cleanup(void);
static int __init main_init(void) { return init(); }
static void __exit main_cleanup(void) { cleanup(); }
module_init(main_init);
module_exit(main_cleanup);
and my Makefile:
obj-m := test.o
test-objs := inthan.o module.o
KVERSION = $(shell uname -r)
inthan.o: inthan.asm
nasm -f elf64 -o $# $^
build:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules
Kernel mode lives in the "negative" (ie. top) part of the address space, where 32 bit absolute addresses can not be used (because they are not sign-extended). As you have noticed, gcc uses rip-relative addresses to work around this problem which gives offsets from the current instruction pointer. You can make nasm do the same by using the DEFAULT REL directive. See the relevant section in the nasm documentation.
you can always use inline assembly
asm("add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0" \
54 : "=&r" (flag), "=r" (roksum) \
55 : "1" (addr), "g" ((long)(size)), \
56 "rm" (limit));
How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.
I know in x86, we can use
__asm__(
" movl $1, %eax \n"
" movl $0, %ebx \n"
" call *%gs:0x10 \n"
);
to route to sysenter indirectly.
But how can we code using sysenter/syscall directly to issue a system call?
I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
First of all, you can't safely use GNU C Basic asm(""); syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like "D"(1) means as part of an asm() statement.
You also need asm volatile because that's not implicit for Extended asm statements with 1 or more output operands.
I'm going to show you how to execute system calls by writing a program that writes Hello World! to standard output by using the write() system call. Here's the source of the program without an implementation of the actual system call :
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size);
int main(void)
{
const char hello[] = "Hello world!\n";
my_write(1, hello, sizeof(hello));
return 0;
}
You can see that I named my custom system call function as my_write in order to avoid name clashes with the "normal" write, provided by libc. The rest of this answer contains the source of my_write for i386 and amd64.
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80 in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER was never meant as a direct replacement of the int 0x80 API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10 in your code is for), which contains all the code supporting the SYSENTER instruction. There's quite a lot of it because of how the instruction actually works.
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls - some system calls like getpid and clock_gettime are so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even than sysenter could be.
It's much easier to use the slower int $0x80 to invoke the 32-bit ABI.
// i386 Linux
#include <asm/unistd.h> // compile with -m32 for 32 bit call numbers
//#define __NR_write 4
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"int $0x80"
: "=a" (ret)
: "0"(__NR_write), "b"(fd), "c"(buf), "d"(size)
: "memory" // the kernel dereferences pointer args
);
return ret;
}
As you can see, using the int 0x80 API is relatively simple. The number of the syscall goes to the eax register, while all the parameters needed for the syscall go into respectively ebx, ecx, edx, esi, edi, and ebp. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h.
Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2).
The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Keep in mind that the clobber list also contains the memory parameter, which means that the instruction listed in the instruction list references memory (via the buf parameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
amd64
Things look different on the AMD64 architecture which sports a new instruction called SYSCALL. It is very different from the original SYSENTER instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL, actually, and adapting the old int 0x80 to the new SYSCALL is pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).
In this case, the number of the system call is still passed in the register rax, but the registers used to hold the arguments now nearly match the function calling convention: rdi, rsi, rdx, r10, r8 and r9 in that order. (syscall itself destroys rcx so r10 is used instead of rcx, letting libc wrapper functions just use mov r10, rcx / syscall.)
// x86-64 Linux
#include <asm/unistd.h> // compile without -m32 for 64 bit call numbers
// #define __NR_write 1
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"syscall"
: "=a" (ret)
// EDI RSI RDX
: "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
: "rcx", "r11", "memory"
);
return ret;
}
(See it compile on Godbolt)
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
The "0"(callnum) matching constraint could be written as "a" because operand 0 (the "=a"(ret) output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.
Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.
Explicit register variables
https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Explicit-Register-Variables.html#Explicit-Reg-Vars)
I believe this should now generally be the recommended approach over register constraints because:
it can represent all registers, including r8, r9 and r10 which are used for system call arguments: How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?
it's the only optimal option for other ISAs besides x86 like ARM, which don't have the magic register constraint names: How to specify an individual register as constraint in ARM GCC inline assembly? (besides using a temporary register + clobbers + and an extra mov instruction)
I'll argue that this syntax is more readable than using the single letter mnemonics such as S -> rsi
Register variables are used for example in glibc 2.29, see: sysdeps/unix/sysv/linux/x86_64/sysdep.h.
main_reg.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
register int64_t rax __asm__ ("rax") = 1;
register int rdi __asm__ ("rdi") = fd;
register const void *rsi __asm__ ("rsi") = buf;
register size_t rdx __asm__ ("rdx") = size;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi), "r" (rsi), "r" (rdx)
: "rcx", "r11", "memory"
);
return rax;
}
void my_exit(int exit_status) {
register int64_t rax __asm__ ("rax") = 60;
register int rdi __asm__ ("rdi") = exit_status;
__asm__ __volatile__ (
"syscall"
: "+r" (rax)
: "r" (rdi)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Compile and run:
gcc -O3 -std=c99 -ggdb3 -ffreestanding -nostdlib -Wall -Werror \
-pedantic -o main_reg.out main_reg.c
./main.out
echo $?
Output
hello world
0
For comparison, the following analogous to How to invoke a system call via syscall or sysenter in inline assembly? produces equivalent assembly:
main_constraint.c
#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (1), "D" (fd), "S" (buf), "d" (size)
: "rcx", "r11", "memory"
);
return ret;
}
void my_exit(int exit_status) {
ssize_t ret;
__asm__ __volatile__ (
"syscall"
: "=a" (ret)
: "0" (60), "D" (exit_status)
: "rcx", "r11", "memory"
);
}
void _start(void) {
char msg[] = "hello world\n";
my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}
GitHub upstream.
Disassembly of both with:
objdump -d main_reg.out
is almost identical, here is the main_reg.c one:
Disassembly of section .text:
0000000000001000 <my_write>:
1000: b8 01 00 00 00 mov $0x1,%eax
1005: 0f 05 syscall
1007: c3 retq
1008: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
100f: 00
0000000000001010 <my_exit>:
1010: b8 3c 00 00 00 mov $0x3c,%eax
1015: 0f 05 syscall
1017: c3 retq
1018: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101f: 00
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: bf 01 00 00 00 mov $0x1,%edi
102a: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: ba 0d 00 00 00 mov $0xd,%edx
1043: b8 01 00 00 00 mov $0x1,%eax
1048: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104f: 0a
1050: 0f 05 syscall
1052: 31 ff xor %edi,%edi
1054: 48 83 f8 0d cmp $0xd,%rax
1058: b8 3c 00 00 00 mov $0x3c,%eax
105d: 40 0f 95 c7 setne %dil
1061: 0f 05 syscall
1063: c3 retq
So we see that GCC inlined those tiny syscall functions as would be desired.
my_write and my_exit are the same for both, but _start in main_constraint.c is slightly different:
0000000000001020 <_start>:
1020: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
1025: 48 8d 74 24 f3 lea -0xd(%rsp),%rsi
102a: ba 0d 00 00 00 mov $0xd,%edx
102f: 48 b8 68 65 6c 6c 6f movabs $0x6f77206f6c6c6568,%rax
1036: 20 77 6f
1039: 48 89 44 24 f3 mov %rax,-0xd(%rsp)
103e: b8 01 00 00 00 mov $0x1,%eax
1043: c7 44 24 fb 72 6c 64 movl $0xa646c72,-0x5(%rsp)
104a: 0a
104b: 89 c7 mov %eax,%edi
104d: 0f 05 syscall
104f: 31 ff xor %edi,%edi
1051: 48 83 f8 0d cmp $0xd,%rax
1055: b8 3c 00 00 00 mov $0x3c,%eax
105a: 40 0f 95 c7 setne %dil
105e: 0f 05 syscall
1060: c3 retq
It is interesting to observe that in this case GCC found a slightly shorter equivalent encoding by picking:
104b: 89 c7 mov %eax,%edi
to set the fd to 1, which equals the 1 from the syscall number, rather than a more direct:
1025: bf 01 00 00 00 mov $0x1,%edi
For an in-depth discussion of the calling conventions, see also: What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Tested in Ubuntu 18.10, GCC 8.2.0.