Implementation of function execve (unistd.h)

Implementation of function execve (unistd.h) - linux

How can I see the implementation of function execve (under x86_64 Linux), it is in the library unistd?
I want this because I want to know how can I call an external program using assembler, without calling execve.
I know that there is a syscall named execve, but I don't know how can I use it.
How can I put a variable of type char * and type char * [] into registers ?

The implementation of the execve() function in userspace looks something like:
int execve(const char *filename, char * const argv[], char * const envp[]) {
return syscall(SYS_execve, filename, argv, envp);
}
All of the actual "work" is done in the kernel. There's nothing particularly interesting happening in libc, besides perhaps some threading cleanup.

Just take a look at the kernel sources (more specifically: arch/YOUR-ARCH/kernel/head*.S) for the system call convention on your architecture (registers and/or stack for the syscall number and the parameters).
On ARM, for example, you would load __NR_execve into r7, load the arguments into r0, r1, r2 and then use swi 0. You might be interested in this explantion of ARM EABI syscalls for more details.

There is no real straightforward implementation of system calls in the source code to glibc - this is generated at build time from various files defining the system call numbers.
The relevant information can be found in sysdep.h if you understand it, except for the actual system call numbers (you want __NR_execve with, IIRC, #include <asm/unistd.h> - I can't recall offhand what it is on x86_64).
The system call number goes in %rax, and the arguments go in %rdi %rsi %rdx. All this information (including stack alignment and something about register usage by the kernel) is commented in sysdep.h.

Related

On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program?

I decided yesterday to learn assembly (NASM syntax) after years of C++ and Python and I'm already confused about the way to exit a program. It's mostly about ret because it's the suggested instruction on SASM IDE.
I'm speaking for main obviously. I don't care about x86 backward compatibility. Only the x64 Linux best way. I'm curious.

If you use printf or other libc functions, it's best to ret from main or call exit. (Which are equivalent; main's caller will call the libc exit function.)
If not, if you were only making other raw system calls like write with syscall, it's also appropriate and consistent to exit that way, but either way, or call exit are 100% fine in main.
If you want to work without libc at all, e.g. put your code under _start: instead of main: and link with ld or gcc -static -nostdlib, then you can't use ret. Use mov eax, 231 (__NR_exit_group) / syscall.
main is a real & normal function like any other (called with a valid return address), but _start (the process entry point) isn't. On entry to _start, the stack holds argc and argv, so trying to ret would set RIP=argc, and then code-fetch would segfault on that unmapped address. Nasm segmentation fault on RET in _start
System call vs. ret-from-main
Exiting via a system call is like calling _exit() in C - skip atexit() and libc cleanup, notably not flushing any buffered stdout output (line buffered on a terminal, full-buffered otherwise).
This leads to symptoms such as Using printf in assembly leads to empty output when piping, but works on the terminal (or if your output doesn't end with \n, even on a terminal.)
main is a function, called (indirectly) from CRT startup code. (Assuming you link your program normally, like you would a C program.) Your hand-written main works exactly like a compiler-generate C main function would. Its caller (__libc_start_main) really does do something like int result = main(argc, argv); exit(result);,
e.g. call rax (pointer passed by _start) / mov edi, eax / call exit.
So returning from main is exactly1 like calling exit.
Syscall implementation of exit() for a comparison of the relevant C functions, exit vs. _exit vs. exit_group and the underlying asm system calls.
C question: What is the difference between exit and return? is primarily about exit() vs. return, although there is mention of calling _exit() directly, i.e. just making a system call. It's applicable because C main compiles to an asm main just like you'd write by hand.
Footnote 1: You can invent a hypothetical intentionally weird case where it's different. e.g. you used stack space in main as your stdio buffer with sub rsp, 1024 / mov rsi, rsp / ... / call setvbuf. Then returning from main would involve putting RSP above that buffer, and __libc_start_main's call to exit could overwrite some of that buffer with return addresses and locals before execution reached the fflush cleanup. This mistake is more obvious in asm than C because you need leave or mov rsp, rbp or add rsp, 1024 or something to point RSP at your return address.
In C++, return from main runs destructors for its locals (before global/static exit stuff), exit doesn't. But that just means the compiler makes asm that does more stuff before actually running the ret, so it's all manual in asm, like in C.
The other difference is of course the asm / calling-convention details: exit status in EAX (return value) or EDI (first arg), and of course to ret you have to have RSP pointing at your return address, like it was on function entry. With call exit you don't, and you can even do a conditional tailcall of exit like jne exit. Since it's a noreturn function, you don't really need RSP pointing at a valid return address. (RSP should be aligned by 16 before a call, though, or RSP%16 = 8 before a tailcall, matching the alignment after call pushes a return address. It's unlikely that exit / fflush cleanup will do any alignment-required stores/loads to the stack, but it's a good habit to get this right.)
(This whole footnote is about ret vs. call exit, not syscall, so it's a bit of a tangent from the rest of the answer. You can also run syscall without caring where the stack-pointer points.)
SYS_exit vs. SYS_exit_group raw system calls
The raw SYS_exit system call is for exiting the current thread, like pthread_exit().
(eax=60 / syscall, or eax=1 / int 0x80).
SYS_exit_group is for exiting the whole program, like _exit.
(eax=231 / syscall, or eax=252 / int 0x80).
In a single-threaded program you can use either, but conceptually exit_group makes more sense to me if you're going to use raw system calls. glibc's _exit() wrapper function actually uses the exit_group system call (since glibc 2.3). See Syscall implementation of exit() for more details.
However, nearly all the hand-written asm you'll ever see uses SYS_exit1. It's not "wrong", and SYS_exit is perfectly acceptable for a program that didn't start more threads. Especially if you're trying to save code size with xor eax,eax / inc eax (3 bytes in 32-bit mode) or push 60 / pop rax (3 bytes in 64-bit mode), while push 231/pop rax would be even larger than mov eax,231 because it doesn't fit in a signed imm8.
Note 1: (Usually actually hard-coding the number, not using __NR_... constants from asm/unistd.h or their SYS_... names from sys/syscall.h)
And historically, it's all there was. Note that in unistd_32.h, __NR_exit has call number 1, but __NR_exit_group = 252 wasn't added until years later when the kernel gained support for tasks that share virtual address space with their parent, aka threads started by clone(2). This is when SYS_exit conceptually became "exit current thread". (But one could easily and convincingly argue that in a single-threaded program, SYS_exit does still mean exit the whole program, because it only differs from exit_group if there are multiple threads.)
To be honest, I've never used eax=252 / int 0x80 in anything, only ever eax=1. It's only in 64-bit code where I often use mov eax,231 instead of mov eax,60 because neither number is "simple" or memorable the way 1 is, so might as well be a cool guy and use the "modern" exit_group way in my single-threaded toy program / experiment / microbenchmark / SO answer. :P (If I didn't enjoy tilting at windmills, I wouldn't spend so much time on assembly, especially on SO.)
And BTW, I usually use NASM for one-off experiments so it's inconvenient to use pre-defined symbolic constants for call numbers; with GCC to preprocess a .S before running GAS you can make your code self-documenting with #include <sys/syscall.h> so you can use mov $SYS_exit_group, %eax (or $__NR_exit_group), or mov eax, __NR_exit_group with .intel_syntax noprefix.
Don't use the 32-bit int 0x80 ABI in 64-bit code:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? explains what happens if you use the COMPAT_IA32_EMULATION int 0x80 ABI in 64-bit code.
It's totally fine for just exiting, as long as your kernel has that support compiled in, otherwise it will segfault just like any other random int number like int 0x7f. (e.g. on WSL1, or people that built custom kernels and disabled that support.)
But the only reason you'd do it that way in asm would be so you could build the same source file with nasm -felf32 or nasm -felf64. (You can't use syscall in 32-bit code, except on some AMD CPUs which have a 32-bit version of syscall. And the 32-bit ABI uses different call numbers anyway so this wouldn't let the same source be useful for both modes.)
Related:
Why am I allowed to exit main using ret? (CRT startup code calls main, you're not returning directly to the kernel.)
Nasm segmentation fault on RET in _start - you can't ret from _start
Using printf in assembly leads to empty output when piping, but works on the terminal stdout buffer (not) flushing with raw system call exit
Syscall implementation of exit() call exit vs. mov eax,60/syscall (_exit) vs. mov eax,231/syscall (exit_group).
Can't call C standard library function on 64-bit Linux from assembly (yasm) code - modern Linux distros config GCC in a way that call exit or call puts won't link with nasm -felf64 foo.asm && gcc foo.o.
Is main() really start of a C++ program? - Ciro's answer is a deep dive into how glibc + its CRT startup code actually call main (including x86-64 asm disassembly in GDB), and shows the glibc source code for __libc_start_main.
Linux x86 Program Start Up
or - How the heck do we get to main()? 32-bit asm, and more detail than you'll probably want until you're a lot more comfortable with asm, but if you've ever wondered why CRT runs so much code before getting to main, that covers what's happening at a level that's a couple steps up from using GDB with starti (stop at the process entry point, e.g. in the dynamic linker's _start) and stepi until you get to your own _start or main.
https://stackoverflow.com/tags/x86/info lots of good links about this and everything else.

print an unknown length argument in linux assembly

In linux assembly, we can write a string to the default output with system call write. But this system call needs the string length, but the argument doesn't have a fixed length over all the executions.
So, I know that we can calculate the length of the argument by browsing it and looking for the null byte. Although, I am looking for a simpler way to print an argument (or any string with unknown length) with Linux assembly.
So can anyone tell me the simplest way to print an unknown string length with Linux assembly.

There are no Linux system calls that write an implicit-length string (C-style null-terminated) to a file descriptor. So you have to just work out the length yourself before making a system call.
Linux is portable across many architectures, so I'll express the answer in portable assembly language, aka C:
int write_implicit_length_string(const char *str) {
size_t size = strlen(str);
return write(1, str, size); // stdout is always fd 1
}
If you want to see the asm, compile it with gcc (although that will just show you a function call to strlen. gcc -O3 doesn't inline code for strlen on x86).
As far as asm implementations of strlen, for x86-64 your best bet is an SSE2 loop that uses pcmpeqb / pmovmskb / test / jnz to find the first zero byte. Obviously every ISA will have its own way of doing it, but the important point is that there's no way to have the kernel do it for you.
There are C standard library functions that print strings to stdio FILE * (e.g. fputs) but not to unix file descriptors (libc just has wrappers for system calls).

Linux exit function

I am trying to understand linux syscalls mechanism. I am reading a book and it in the book it says that exit function look like that(with gdb):
mov $0x0,%ebx
mov $0x1,%eax
80 int $0x80
I understand that this is a syscall to exit, but in my Debian it looks like that:
jmp *0x8049698
push $0x8
jmp 0x80482c0
maybe can someone explain me why it's not the same? When I try to do disas on 0x80482c0
gdb prints me:
No function contains specified address.
Also, can someone give me a good reference to Linux Internals material(as Windows internals)?
Thanks!

The function you most likely called is exit() from C Standard Library (see man 3 exit). This function is a library function which, in turn, calls SYS_exit system call, but not being a system call itself. You will not see that good looking int 0x80 code in your C program disassembly. All existing functions (exit(), syscall(), etc.) are called from some library, so your program is only doing call to that library, and those functions are not belong to your program.
If you want to see exactly that int 0x80 code -- you can inline that asm code in your C application. But this is considered a bad practice, though, as your code become architecture-dependent (only applicable to x86 architecture, in your case).
can someone give me a good reference to Linux Internals material
The code itself is the best up-to-date reference. All books are more or less outdated. Also look into Documentation/ directory in kernel sources.

What is the return value of the “inline assembly” code？

// gcc -g stack.c -o stack
//
unsigned long sp(void){ __asm__("mov %esp, %eax");}
int main(int argc, char **argv)
{
unsigned long esp = sp();
printf("Stack pointer (ESP : 0x%lx)\n",esp);
return 0;
}
Please check the above code. And in fact, the sp() will return the esp register value via esp->eax, I guess. But why? The default return value of sp() is eax?
Who could tell me more about it? Thanks!

The way a processor architecture organizes arguments, calls, and returns, (and syscalls to kernel) i.e. calling conventions, is specificed in the ABI (application binary interface). For Linux on x86-64 you should read the x86-64 ABI document. And yes, the returned value for a function returning a long is thru %eax on x86-64. (There is also the X32 ABI)
Notice that it is mostly conventional, but if the convention changes, you'll need to change the compiler, perhaps the linker, the kernel, and all the libraries. Actually, it is so important that processor makers are designing the silicon with existing ABIs in mind (e.g. importance of the %esp register, SYSENTER instruction....).

This is the rules!
The calling convention used by GCC for 32-bit assembly is for the return value of a integer-returning function to be the value in %eax. GCC adopts this for inline assembly functions as well.
See Wikipedia for all the details.

IIRC the correct command should be "mov eax, esp" instead of "mov esp, eax".
unsigned long sp(void){ __asm__("mov %eax, %esp");}

Good references for the syscalls

I need some reference but a good one, possibly with some nice examples. I need it because I am starting to write code in assembly using the NASM assembler. I have this reference:
http://bluemaster.iu.hio.no/edu/dark/lin-asm/syscalls.html
which is quite nice and useful, but it's got a lot of limitations because it doesn't explain the fields in the other registers. For example, if I am using the write syscall, I know I should put 1 in the EAX register, and the ECX is probably a pointer to the string, but what about EBX and EDX? I would like that to be explained too, that EBX determines the input (0 for stdin, 1 for something else etc.) and EDX is the length of the string to be entered, etc. etc. I hope you understood me what I want, I couldn't find any such materials so that's why I am writing here.
Thanks in advance.

The standard programming language in Linux is C. Because of that, the best descriptions of the system calls will show them as C functions to be called. Given their description as a C function and a knowledge of how to map them to the actual system call in assembly, you will be able to use any system call you want easily.
First, you need a reference for all the system calls as they would appear to a C programmer. The best one I know of is the Linux man-pages project, in particular the system calls section.
Let's take the write system call as an example, since it is the one in your question. As you can see, the first parameter is a signed integer, which is usually a file descriptor returned by the open syscall. These file descriptors could also have been inherited from your parent process, as usually happens for the first three file descriptors (0=stdin, 1=stdout, 2=stderr). The second parameter is a pointer to a buffer, and the third parameter is the buffer's size (as an unsigned integer). Finally, the function returns a signed integer, which is the number of bytes written, or a negative number for an error.
Now, how to map this to the actual system call? There are many ways to do a system call on 32-bit x86 (which is probably what you are using, based on your register names); be careful that it is completely different on 64-bit x86 (be sure you are assembling in 32-bit mode and linking a 32-bit executable; see this question for an example of how things can go wrong otherwise). The oldest, simplest and slowest of them in the 32-bit x86 is the int $0x80 method.
For the int $0x80 method, you put the system call number in %eax, and the parameters in %ebx, %ecx, %edx, %esi, %edi, and %ebp, in that order. Then you call int $0x80, and the return value from the system call is on %eax. Note that this return value is different from what the reference says; the reference shows how the C library will return it, but the system call returns -errno on error (for instance -EINVAL). The C library will move this to errno and return -1 in that case. See syscalls(2) and intro(2) for more detail.
So, in the write example, you would put the write system call number in %eax, the first parameter (file descriptor number) in %ebx, the second parameter (pointer to the string) in %ecx, and the third parameter (length of the string) in %edx. The system call will return in %eax either the number of bytes written, or the error number negated (if the return value is between -1 and -4095, it is a negated error number).
Finally, how do you find the system call numbers? They can be found at /usr/include/linux/unistd.h. On my system, this just includes /usr/include/asm/unistd.h, which finally includes /usr/include/asm/unistd_32.h, so the numbers are there (for write, you can see __NR_write is 4). The same goes for the error numbers, which come from /usr/include/linux/errno.h (on my system, after chasing the inclusion chain I find the first ones at /usr/include/asm-generic/errno-base.h and the rest at /usr/include/asm-generic/errno.h). For the system calls which use other constants or structures, their documentation tells which headers you should look at to find the corresponding definitions.
Now, as I said, int $0x80 is the oldest and slowest method. Newer processors have special system call instructions which are faster. To use them, the kernel makes available a virtual dynamic shared object (the vDSO; it is like a shared library, but in memory only) with a function you can call to do a system call using the best method available for your hardware. It also makes available special functions to get the current time without even having to do a system call, and a few other things. Of course, it is a bit harder to use if you are not using a dynamic linker.
There is also another older method, the vsyscall, which is similar to the vDSO but uses a single page at a fixed address. This method is deprecated, will result in warnings on the system log if you are using recent kernels, can be disabled on boot on even more recent kernels, and might be removed in the future. Do not use it.

If you download that web page (like it suggests in the second paragraph) and download the kernel sources, you can click the links in the "Source" column, and go directly to the source file that implements the system calls. You can read their C signatures to see what each parameter is used for.
If you're just looking for a quick reference, each of those system calls has a C library interface with the same name minus the sys_. So, for example, you could check out man 2 lseek to get the information about the parameters forsys_lseek:
off_t lseek(int fd, off_t offset, int whence);
where, as you can see, the parameters match the ones from your HTML table:
%ebx %ecx %edx
unsigned int off_t unsigned int

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Implementation of function execve (unistd.h) - linux

Related

On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program?

print an unknown length argument in linux assembly

Linux exit function

What is the return value of the “inline assembly” code？

Good references for the syscalls

Categories

Resources