Where is the system call table in linux kernel? - linux

I'm reading Linux Kernel Development by Robert Love and one of the exercises he does is to create a system call (page 106). The problem is that I am unable to find the system call table file in v3.9 for the x86_32 architecture. I know that he's using the version 2.6.xx but I don't know if that version will work with the distribution that I'm using as it is pretty old so I would rather prefer v3.9.
More information:
The exercise of which I am speaking is the following:
Add an entry to the end of the system call table.This needs to be done for each architecture that supports the system call (which, for most calls, is all the architectures).The position of the syscall in the table, starting at zero, is its system call number. For example, the tenth entry in the list is assigned syscall number nine.
Solved using the following approach:
The system call table is located in arch/x86/syscalls/syscall_32.tbl for the x86 architecture. Thanks to Sudip Mukherjee for his help.
Another approach is the following:
http://lists.kernelnewbies.org/pipermail/kernelnewbies/2013-July/008598.html
Thanks to Srinivas Ganji for his help too.

From linux kernel 4.2, the system call table has been moved from arch/x86/syscalls/syscall_64.tbl to arch/x86/entry/syscalls/syscall_64.tbl
Here is the corresponding commit:
commit 1f57d5d85ba7f1f467173ff33f51d01a91f9aaf1
Author: Ingo Molnar <mingo#kernel.org>
Date: Wed Jun 3 18:36:41 2015 +0200
x86/asm/entry: Move the arch/x86/syscalls/ definitions to arch/x86/entry/syscalls/
The build time generated syscall definitions are entry code related, move
them into the arch/x86/entry/ directory.

Create a testing folder in src root: src/linux-3.4/testing/, then put inside this folder:
- a file that contains syscall code: strcpy.c
#include <linux/linkage.h>
#include <linux/kernel.h>
asmlinkage long sys_strcpy(char *dest, char *src)
{
int i=0;
while(src[i]!='\0') {
dest[i]=src[i++];
}
dest[i]='\0';
printk(" Done it ");
return 0;
}
and the Makefile that contains just the following line:
obj-y:=strcpy.o
Add an entry to the syscall table and the prototype of the function:
- edit the file src/linux-3.4/arch/x86/syscalls/syscall_32.tbl by adding this line to the entry 223 that is free
223 i386 strcpy sys_strcpy
Edit the file src/linux-3.4/include/linux/syscalls.h by adding the prototype of the function
asmlinkage long sys_strcpy(char *dest, char *src);
Edit the main Makefile in the src root (src/linux-3.4/Makefile) by adding the testing folder created before, as follow:
core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ testing/

For systems where audit is enabled, a table of the syscalls may be easily retrieved with:
ausyscall --dump
For example:
$ ausyscall --dump
Using x86_64 syscall table:
0 read
1 write
2 open
3 close
4 stat
5 fstat
6 lstat
7 poll
8 lseek
9 mmap
10 mprotect
...SNIP...

A similar question on SO where the OP seems to have solved it:
New syscall not found (linux kernel 3.0.0) where should I start looking?
The file seems to be arch/x86/kernel/syscall_table_32.c.

Related

NASM x86_64: File opening (SYS_OPEN) error list?

I'm coding a linux x64 assembly program that read a file and I want to handle errors like File Not Found or permission errors.
Where can I find a list of SYS_OPEN error codes?
Approaches to find codes (kinda fun)
My code to open a file:
SYS_OPEN equ 2
O_RDONLY equ 0
section .data
filename db "file.txt", 0
section .text
global _start
_start:
mov rax, SYS_OPEN
mov rdi, filename
mov rsi, O_RDONLY
mov rdx, 0644o
syscall
[...]
When the file is successfully opened the RAX register points to the file descriptor (positive integer), if fails RAX point to an error (negative integer). I managed to raise a permission error by removing all permissions for all users:
chmod 0000 file.txt
This causes an error with code -13. By deleting the file, I managed to get error -2. Where can I find a list of SYS_OPEN error codes?
PS: Maybe my googling skills are rusty
Linux system call return values from -4095 to -1 are -errno codes. (The actual highest error number that Linux has actually defined is currently about 133, EHWPOISON, but that's the official range.)
strace ./myprog can decode them for you so you don't need to actually write error checking in your toy programs when playing around with system calls.
For example:
$ strace touch /tmp/xyjklj/bar
... (dynamic linker / process startup stuff)
openat(AT_FDCWD, "/tmp/xyjklj/bar", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 ENOENT (No such file or directory)
utimensat(AT_FDCWD, "/tmp/xyjklj/bar", NULL, 0) = -1 ENOENT (No such file or directory)
... (more system calls as touch(1) finds a locale-specific set of error messages and prints
(The -1 is what the libc wrapper function actually returns; the errno code is what strace decoded from the asm syscall return value, which the glibc wrapper will store in errno. When using raw system calls in asm, you don't have to waste instructions doing that. But strace will still say "-1", not the numeric error code)
Documentation of most ways SYS_open can fail
Each system call man page documents which error codes that particular system call can fail with, and in which cases that can happen. (Those list aren't fully exhaustive, for example not covering weird things a specific filesystem like NFS could return, like EMULTIHOP (see comments).)
For your case, see the ERRORs section of the open(2) man page. e.g. there are several entries for ENOENT, covering all the cases which can lead to that return value.
ENOENT - O_CREAT is not set and the named file does not exist.
ENOENT - A directory component in pathname does not exist or is a
dangling symbolic link.
ENOENT - pathname refers to a nonexistent directory, O_TMPFILE and
one of O_WRONLY or O_RDWR were specified in flags, but
this kernel version does not provide the O_TMPFILE
functionality.
(Spoiler alert, 2 is ENOENT, so -2 is -ENOENT.)
There are of course lots of other fun ways that pathname and file access stuff (and open(2) in particular) can error, including:
EACCES (-13) - The requested access to the file is not allowed, or search
permission is denied for one of the directories in the
path prefix of pathname, or the file did not exist yet and
write access to the parent directory is not allowed. (See
also path_resolution(7).)
EFAULT - pathname points outside your accessible address space.
ENAMETOOLONG -
pathname was too long.
EBUSY - O_EXCL was specified in flags and pathname refers to a
block device that is in use by the system (e.g., it is
mounted).
[this would require root, otherwise you'd get EACCESS]
ETXTBSY - pathname refers to an executable image which is currently
being executed and write access was requested.
EWOULDBLOCK -
The O_NONBLOCK flag was specified, and an incompatible
lease was held on the file (see fcntl(2)).
ENODEV - pathname refers to a device special file and no
corresponding device exists. (This is a Linux kernel bug;
in this situation ENXIO must be returned.)
ELOOP - Too many symbolic links were encountered in resolving
pathname.
EISDIR - pathname refers to a directory and the access requested
involved writing (that is, O_WRONLY or O_RDWR is set).
ENOTDIR -
A component used as a directory in pathname is not, in
fact, a directory, or O_DIRECTORY was specified and
pathname was not a directory.
EPERM - The O_NOATIME flag was specified, but the effective user
ID of the caller did not match the owner of the file and
the caller was not privileged.
As well as various limits like number of open files (ENFILE, EMFILE), or ENOSPC disk space full. The above is not a complete list, I just took one each the ways to get many (but not all) of the error codes.
As per funnydman's answer, you can look up the number -> symbolic meaning of error values in man pages. Or look in /usr/include/asm-generic/errno-base.h (The full path may differ on some systems, and you'd only include this file indirectly, via #include <errno.h>)
You can interpret this as values of errno, here is the table (to list all of the codes use errno -l), also take a look at the docs. A part of the table:
number
hex
symbol
description
2
0x02
ENOENT
No such file or directory
13
0x0d
EACCES
Permission denied
There is described a reason of such decision: https://stackoverflow.com/a/6008711/9926721

Is seteuid a system call on Linux?

All of the literature that I have read so far on setuid talks about seteuid in a way that implies it is a system call. The section 2 man pages never say if a function is a system call or not, so seteuid(2) is no help. And if it isn't a system call, meaning the functionality is not provided by the kernel, then how can "set effective UID" be achieved?
The section 2 man pages are all system calls -- that's what section 2 is for. The section 3 man pages are all library calls, as that's what section 3 is for. See man(1) (the manual page for man itself) for the list of sections and what they are:
1 Executable programs or shell commands
2 System calls (functions provided by the kernel)
3 Library calls (functions within program libraries)
4 Special files (usually found in /dev)
5 File formats and conventions eg /etc/passwd
6 Games
7 Miscellaneous (including macro packages and conventions), e.g.
man(7), groff(7)
8 System administration commands (usually only for root)
9 Kernel routines [Non standard]
You can easily verify if it is a system call or if it is defined in libc by writing a little program and running strace on it. For example,
int main() {
seteuid();
}
gcc -o main main.c
-bash-4.2$ strace ./main 2>&1 | grep set
setresuid(-1, 1, -1) = -1 EPERM (Operation not permitted)
So in this case seteuid is implemented in libc. See here for implementation

linux kallsyms R symbol not showing

I wan't to find the kernel address of system call table.
I usually do this by grepping sys_call
but in one system, I can see the address
but in other, it doesn't show the entry.
root#ubuntu:~# cat /proc/kallsyms | grep sys_call
ffffffff8122aa90 t proc_sys_call_handler
ffffffff81726432 t ret_from_sys_call
ffffffff81726644 T int_ret_from_sys_call
ffffffff81728146 t sysexit_from_sys_call
ffffffff81728386 t sysretl_from_sys_call
ffffffff8172858e t ia32_ret_from_sys_call
**ffffffff81801400 R sys_call_table**
ffffffff81809cc0 R ia32_sys_call_table
root#ubuntu:~#
no system call table... why not showing the R type symbol??
/ $ cat /proc/kallsyms | grep sys_call
ffffffff8119c230 t proc_sys_call_handler
ffffffff817a1a57 t ret_from_sys_call
ffffffff817a1c50 T int_ret_from_sys_call
ffffffff817a2cb8 t sysexit_from_sys_call
ffffffff817a2ed8 t sysretl_from_sys_call
ffffffff817a30be t ia32_ret_from_sys_call
/ $
/ $
in what case does this could happen?
some advice would be nice
thank you
You should look into the version of the kernel in both cases, check with uname -r.
This was initially exported in the earlier versions of the kernel 2.4.x. This initially had "EXPORT_SYMBOL(sys_call_table);" line from linux/kernel/ksyms.c for
sys_call_table from being exported properly and later was made static and removed IMU.
Now this has been exported again in of some of latest kernels (in some version > 3.3.x). I would recommend digging into the LXR to check out the details.
You need to check whether your current kernel is compiled with the option CONFIG_KALLSYMS_ALL=y

Get the path of link that it points to?

Is it possible to get the abolute path of the link that it is pointing to?
Is there any simple system command?
I need for all of the following OS
HP-UX 11i, 1123u, 1123i
AIX 5.2 and 5.3
Suse Linux 10
Solaris 10
You didn't specify a language, so I assume you want a command that can be run in whatever shell you are using. The ls command has the -l (that is an ell) option which prints out a lot of information about the file. The last bit of information is the full path, so you should be able to say
ls -l file | awk '{print $NF}'
on any SUS2 compliant machine (which should be all of the commercial UNIXes). This will have a problem if the file or the any of the directories leading up to the file have spaces though.
If you are looking for a system call, you want readlink(2). This is standardized, and so should be available on all POSIX compliant systems.
Here's an example of its usage, taken from the link given earlier:
#include <unistd.h>
char buf[1024];
ssizet_t len;
if ((len = readlink("/modules/pass1", buf, sizeof(buf)-1)) != -1)
buf[len] = '\0';
If you're looking for a command line utility, it doesn't look like there is one standardized, but GNU (Linux) and BSD both have readlink(1).

Why is ARG_MAX not defined via limits.h?

On Fedora Core 7, I'm writing some code that relies on ARG_MAX. However, even if I #include <limits.h>, the constant is still not defined. My investigations show that it's present in <sys/linux/limits.h>, but this is supposed to be portable across Win32/Mac/Linux, so directly including it isn't an option. What's going on here?
The reason it's not in limits.h is that it's not a quantity giving the limits of the value range of an integral type based on bit width on the current architecture. That's the role assigned to limits.h by the ISO standard.
The value in which you're interested is not hardware-bound in practice and can vary from platform to platform and perhaps system build to system build.
The correct thing to do is to call sysconf and ask it for "ARG_MAX" or "_POSIX_ARG_MAX". I think that's the POSIX-compliant solution anyway.
Acc. to my documentation, you include one or both of unistd.h or limits.h based on what values you're requesting.
One other point: many implementations of the exec family of functions return E2BIG or a similar value if you try to call them with an oversized environment. This is one of the defined conditions under which exec can actually return.
For the edification of future people like myself who find themselves here after a web search for "arg_max posix", here is a demonstration of the POSIXly-correct method for discerning ARG_MAX on your system that Thomas Kammeyer refers to in his answer:
cc -x c <(echo '
#include <unistd.h>
#include <stdio.h>
int main() { printf("%li\n", sysconf(_SC_ARG_MAX)); }
')
This uses the process substitution feature of Bash; put the same lines in a file and run cc thefile.c if you are using some other shell.
Here's the output for macOS 10.14:
$ ./a.out
262144
Here's the output for a RHEL 7.x system configured for use in an HPC environment:
$ ./a.out
4611686018427387903
$ ./a.out | numfmt --to=iec-i # 'numfmt' from GNU coreutils
4.0Ei
For contrast, here is the method prescribed by https://porkmail.org/era/unix/arg-max.html, which uses the C preprocessor:
cpp <<HERE | tail -1
#include <limits.h>
ARG_MAX
HERE
This does not work on Linux for reasons still not entirely clear to me—I am not a systems programmer and not conversant in the POSIX or ISO specs—but probably explained above.
ARG_MAX is defined in /usr/include/linux/limits.h. My linux kernel version is 3.2.0-38.

Resources