I would like to know in kernel source version >= 2.6 where brk is defined. That is which c file contains its definition? grep is not revealing much. Also sbrk is implemented in glibc correct?
It's in mmap.c. Look for:
SYSCALL_DEFINE1(brk, unsigned long, brk)
The manual page says:
On Linux, sbrk() is implemented as a library function that uses the
brk() system call, and does some internal bookkeeping so that it can
return the old break value.
Related
What are the differences bettween linux system call mmap(2) and posix mmap(3) function?
How to distinguish which one is used when broswing the source code,since they have the same header file.For details, see below.
I am running on Ubuntu.I do not think it matters what operating system you are using.The mannual page really does not supply much useful information indeed.
As per the reply of Jörg W Mittag, I think the mmap must be posix function when i am broswing the source code.But i wonder that why i need not to explicitly link to posix library when using the mmamp(3) function .I mean no extra link flag is needed when complie the source code.
As per the reply of Faschingbauer,some question arise if we make the conclusion that no posix mmap is implenmented.You see, there are some posix function implemented(eg, shm_opn、sem_open, mq_open).In the mean time,there are corresponding ones with the same functions(eg, shmget,semget, msgget).How to explain it?At least, I think some posix functions are implemented by linux.
#log for "man 2 mmap"
MMAP(2) Linux Programmer's Manual
NAME
mmap, munmap - map or unmap files or devices into memory
SYNOPSIS
#include <sys/mman.h>
#log for "man 3 mmap"
MMAP(3POSIX) POSIX Programmer's Manual
PROLOG
This manual page is part of the POSIX Programmer's Manual. The Linux implementation of this interface may differ (consult the corresponding Linux manual page for details of Linux behavior), or the interface may not be implemented on Linux.
NAME
mmap — map pages of memory
SYNOPSIS
#include <sys/mman.h>
POSIX vs. Linux
First, some facts:
POSIX is a standard, made by a standards body. POSIX does not
implement anything, but rather define feature set and behavior of
interfaces. Part of this definition is a number of man pages - the
"POSIX Programmer's Manual"
Linux implements the POSIX standard, just like other UNIX
operating systems do. (I do not know if Linux is "POSIX certified",
nor do I care.) In implementing the POSIX standard, Linux takes the
freedom to extend the standard with Linux specific features; hence
it brings its own set of manual pages, the "Linux Programmer's
Manual".
Looking at the Linux ("man 2 mmap") man page, you can see that it
defines, for example, the MAP_LOCKED bit in the flags argument
(btw. MAP_LOCKED makes a separate call to mlock() unnecessary). This
flag does not appear in the POSIX man page ("man 3 mmap"), because it
is not required by the POSIX standard for a conforming implementation.
That said, there is no way to use an alternative implementation of
mmap() in Linux. If some source code that you are reading uses mmap(),
and you are on Linux, then the Linux implementation of mmap() is used, simply
because there is no POSIX implementation of it.
Respectively, the POSIX version is contained in the Linux
implementation. Linux is "compatible" with POSIX, so to say - it does
not redefine any feature required by POSIX (this would mean to violate the standard), but only adds extensions
like the MAP_LOCKED above.
Manual Pages
Hm. My personal opinion is that the POSIX version of, say, the mmap
man page is only there to confuse users. If you inadvertently hit the
section "3" mmap() man page, and you don't know what the relationship
is between POSIX and Linux, then you are indeed seriously confused at
best, or on the wrong track at worst.
I suggest you omit the section number and just say "man mmap" - this searches all the manual
sections for occurences of "mmap" and stops at the first (see "man man" for the exact definition).
(This does not work as envisioned with "man write" when you are
searching for the definition of the write() system call, because there
is a command "write" with the same same in section "1" :-) )
System Calls
As stated by "man man", manual section "2" contains system call
documentation. Knowing that mmap() is implemented by the Linux kernel
(because it is the kernel who implements core OS functionality like
memory management) can only help to clear up the confusion as to
whether the documentation you are reading is what you want.
What are the differences bettween linux system mmap(2) and posix mmap(3) function?
Section 2 documents syscalls. Section 3 documents functions. Therefore, mmap(2) is not a function at all, it is a syscall.
How to distinguish which one is used when broswing the source code?
If it is a function call, it is mmap(3). If it is a syscall, it is mmap(2). Since it is impossible to portably call syscalls from C, there will always be some sort of macro or wrapper function for the syscall.
Also, unless you are reading the source code of the runtime library for a C compiler (e.g. GCC's) or the source code of a POSIX library (such as Glibc, dietlibc, µClibc, or musl), it is highly unlikely that you will find any syscalls in the code.
But i wonder that why i need not to explicitly link to posix library when using the mmamp(3) function .
You need not link another library because mmap is contained in GLIBC; you can see this e. g. with
nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep mmap
00000000000e4610 W mmap
00000000000e4610 W mmap64
I am aware that an implementer has a choice of whether he wants to zero a malloc page or let OS give him a zeroed page (for more optimization purposes).
My question is simple - in Ubuntu 14.04 LTS which comes with linux kernel 3.16 and gcc 4.8.4, who will zero my pages? Is it in user land or kernel land?
It can depend on where the memory came from. The calloc code is userland, and will zero a memory page that gets re-used by a process. This happens when the memory is previously used and then freed, but not returned to the OS. However, if the page is newly allocated to the process, it will come already cleared to 0 by the OS (for security purposes), and so does not need to be cleared by calloc. This means calloc can potentially be faster than calling malloc followed by memset, since it can skip the memset if it knows it will already by zeroed.
That depends on the implementer of your standard library, not on the host system. It is not possible to give a specific answer for a particular OS, since it may be the build target of multiple compilers and their libraries - including on other systems, if you consider the possibility of cross-compiling (building on one type of system to target another).
Most implementations I've seen of calloc() use a call of malloc() followed by either a call of memset() or (with some implementations that target unix) a legacy function called bzero() - which is, itself, sometimes replaced by a macro call that expands to a call of memset() in a number of recent versions of libraries.
memset() is often hand-optimised. But, again, it is up to the implementer of the library.
I am working with Linux-3.9.3 kernel in Ubuntu 10.04. I have added a basic system call in the kernel directory of the linux-3.9.3 source tree. I am able to use it with syscall() by passing my newly system call number in it as an argument. But I want to invoke it directly by using its method name as in the case of getpid() or open() system calls. Can any one help me to add it in GNU C library. I went through few documents but did not get any clear idea of how to accomplish it.
Thanks!!!
Assuming you are on a 64 bits Linux x86-64, the relevant ABI is the x86-64 ABI. Read also the x86 calling conventions wikipage and the linux assembly howto and syscalls(2)
So syscalls are using a different convention than ordinary function calls (e.g. all arguments are passed by registers, error condition could use the carry bit). Hence, you need a C wrapper to make your syscall available to C applications.
You could look into the source code of existing C libraries, like GNU libc or musl libc (so you'll need to make your own library for that syscall).
The MUSL libc source code is very readable, see e.g. its src/unistd/fsync.c as an example.
I would suggest wrapping your new syscall in your own library without patching libc. Notice that some uncommon syscalls are sitting in a different library, e.g. request_key(2) has its C wrapper in libkeyutils
Is there a libc in kernel space? I mean you have to build kernel against some libc right? So Is there a libc (probably statically-linked) sitting within kernel space?
If yes, how is this related to userland glibc? Must they be the same version?
There is actually no libc in kernel space. Libc is user-space library, and you can't use it from kernel-space.
But almost all functions from libc that make sense in kernel space are ported. You can find headers in include/linux/ usually.
As far as I know these two implementations don't share codebase.
Some of the functions that are available in libc are implemented inside the kernel code, for example there's a printf function that works as the normal (at least as far as the kernel code it self requires).
This means that while it looks like the code uses libc (by the functions that seem to be available) there actually no need to link it with a library (AFAIK).
What is the difference between SYS_exit, sys_exit() and exit()?
What I understand :
The linux kernel provides system calls, which are listed in man 2 syscalls.
There are wrapper functions of those syscalls provided by glibc which have mostly similar names as the syscalls.
My question : In man 2 syscalls, there is no mention of SYS_exit and sys_exit(), for example. What are they?
Note : The syscall exit here is only an example. My question really is : What are SYS_xxx and sys_xxx()?
I'll use exit() as in your example although this applies to all system calls.
The functions of the form sys_exit() are the actual entry points to the kernel routine that implements the function you think of as exit(). These symbols are not even available to user-mode programmers. That is, unless you are hacking the kernel, you cannot link to these functions because their symbols are not available outside the kernel. If I wrote libmsw.a which had a file scope function like
static int msw_func() {}
defined in it, you would have no success trying to link to it because it is not exported in the libmsw symbol table; that is:
cc your_program.c libmsw.a
would yield an error like:
ld: cannot resolve symbol msw_func
because it isn't exported; the same applies for sys_exit() as contained in the kernel.
In order for a user program to get to kernel routines, the syscall(2) interface needs to be used to effect a switch from user-mode to kernel mode. When that mode-switch (somtimes called a trap) occurs a small integer is used to look up the proper kernel routine in a kernel table that maps integers to kernel functions. An entry in the table has the form
{SYS_exit, sys_exit},
Where SYS_exit is an preprocessor macro which is
#define SYS_exit (1)
and has been 1 since before you were born because there hasn't been reason to change it. It also happens to be the first entry in the table of system calls which makes look up a simple array index.
As you note in your question, the proper way for a regular user-mode program to access sys_exit is through the thin wrapper in glibc (or similar core library). The only reason you'd ever need to mess with SYS_exit or sys_exit is if you were writing kernel code.
This is now addressed in man syscall itself,
Roughly speaking, the code belonging to the system call with number __NR_xxx defined in /usr/include/asm/unistd.h can be found in the Linux kernel source in the routine sys_xxx(). (The dispatch table for i386 can be found in /usr/src/linux/arch/i386/kernel/entry.S.) There are many exceptions, however, mostly because older system calls were superseded by newer ones, and this has been treated somewhat unsystematically. On platforms with proprietary operating-system emulation, such as parisc, sparc, sparc64, and alpha, there are many additional system calls; mips64 also contains a full set of 32-bit system calls.
At least now /usr/include/asm/unistd.h is a preprocessor hack that links to either,
/usr/include/asm/unistd_32.h
/usr/include/asm/unistd_x32.h
/usr/include/asm/unistd_64.h
The C function exit() is defined in stdlib.h. Think of this as a high level event driven interface that allows you to register a callback with atexit()
/* Call all functions registered with `atexit' and `on_exit',
in the reverse of the order in which they were registered,
perform stdio cleanup, and terminate program execution with STATUS. */
extern void exit (int __status) __THROW __attribute__ ((__noreturn__));
So essentially the kernel provides an interface (C symbols) called __NR_xxx. Traditionally people want sys_exit() which is defined with a preprocessor macro SYS_exit. This macro creates the sys_exit() function. The exit() function is part of the standard C library stdlib.h and ported to other operating systems that lack the Linux Kernel ABI entirely (there may not be __NR_xxx functions) and potentially don't even have sys_* functions available either (you could write exit() to send the interrupt or use VDSO in Assembly).