All of the literature that I have read so far on setuid talks about seteuid in a way that implies it is a system call. The section 2 man pages never say if a function is a system call or not, so seteuid(2) is no help. And if it isn't a system call, meaning the functionality is not provided by the kernel, then how can "set effective UID" be achieved?
The section 2 man pages are all system calls -- that's what section 2 is for. The section 3 man pages are all library calls, as that's what section 3 is for. See man(1) (the manual page for man itself) for the list of sections and what they are:
1 Executable programs or shell commands
2 System calls (functions provided by the kernel)
3 Library calls (functions within program libraries)
4 Special files (usually found in /dev)
5 File formats and conventions eg /etc/passwd
6 Games
7 Miscellaneous (including macro packages and conventions), e.g.
man(7), groff(7)
8 System administration commands (usually only for root)
9 Kernel routines [Non standard]
You can easily verify if it is a system call or if it is defined in libc by writing a little program and running strace on it. For example,
int main() {
seteuid();
}
gcc -o main main.c
-bash-4.2$ strace ./main 2>&1 | grep set
setresuid(-1, 1, -1) = -1 EPERM (Operation not permitted)
So in this case seteuid is implemented in libc. See here for implementation
Related
Tying a script to a specific interpreter via a so-called shebang line is a well-known practice on POSIX operating systems. For example, if the following script is executed (given sufficient file-system permissions), the operating system will launch the /bin/sh interpreter with the file name of the script as its first argument. Subsequently, the shell will execute the commands in the script skipping over the shebang line which it will treat as a comment.
#! /bin/sh
date -R
echo hello world
Possible output:
Sat, 01 Apr 2017 12:34:56 +0100
hello world
I used to believe that the interpreter (/bin/sh in this example) must be a native executable and cannot be a script itself that, in turn, would require yet another interpreter to be launched.
However, I went ahead and tried the following experiment nonetheless.
Using the following dumb shell saved as /tmp/interpreter.py, …
#! /usr/bin/python3
import sys
import subprocess
for script in sys.argv[1:]:
with open(script) as istr:
status = any(
map(
subprocess.call,
map(
str.split,
filter(
lambda s : s and not s.startswith('#'),
map(str.strip, istr)
)
)
)
)
if status:
sys.exit(status)
… and the following script saved as /tmp/script.xyz,
#! /tmp/interpreter.py
date -R
echo hello world
… I was able (after making both files executable), to execute script.xyz.
5gon12eder:/tmp> ls -l
total 8
-rwxr-x--- 1 5gon12eder 5gon12eder 493 Jun 19 01:01 interpreter.py
-rwxr-x--- 1 5gon12eder 5gon12eder 70 Jun 19 01:02 script.xyz
5gon12eder:/tmp> ./script.xyz
Mon, 19 Jun 2017 01:07:19 +0200
hello world
This surprised me. I was even able to launch scrip.xyz via another script.
So, what I am asking is this:
Is the behavior observed by my experiment portable?
Was the experiment even conducted correctly or are there situations where this doesn't work? How about different (Unix-like) operating systems?
If this is supposed to work, is it true that there is no observable difference between a native executable and an interpreted script as far as invocation is concerned?
New executables in Unix-like operating systems are started by the system call execve(2). The man page for execve includes:
Interpreter scripts
An interpreter script is a text file that has execute
permission enabled and whose first line is of the form:
#! interpreter [optional-arg]
The interpreter must be a valid pathname for an executable which
is not itself a script. If the filename argument of execve()
specifies an interpreter script, then interpreter will be invoked
with the following arguments:
interpreter [optional-arg] filename arg...
where arg... is the series of words pointed to by the argv
argument of execve().
For portable use, optional-arg should either be absent, or be
specified as a single word (i.e., it should not contain white
space); see NOTES below.
So within those contraints (Unix-like, optional-arg at most one word), yes, shebang scripts are portable. Read the man page for more details, including other differences in invocation between binary executables and scripts.
See boldfaced text below:
This mechanism allows scripts to be used in virtually any context
normal compiled programs can be, including as full system programs,
and even as interpreters of other scripts. As a caveat, though, some
early versions of kernel support limited the length of the interpreter
directive to roughly 32 characters (just 16 in its first
implementation), would fail to split the interpreter name from any
parameters in the directive, or had other quirks. Additionally, some
modern systems allow the entire mechanism to be constrained or
disabled for security purposes (for example, set-user-id support has
been disabled for scripts on many systems). -- WP
And this output from COLUMNS=75 man execve | grep -nA 23 "
Interpreter scripts" | head -39 on a Ubuntu 17.04 box,
particularly lines #186-#189 which tells us what works on Linux, (i.e. scripts can be interpreters, up to four levels deep):
166: Interpreter scripts
167- An interpreter script is a text file that has execute permission
168- enabled and whose first line is of the form:
169-
170- #! interpreter [optional-arg]
171-
172- The interpreter must be a valid pathname for an executable file.
173- If the filename argument of execve() specifies an interpreter
174- script, then interpreter will be invoked with the following argu‐
175- ments:
176-
177- interpreter [optional-arg] filename arg...
178-
179- where arg... is the series of words pointed to by the argv argu‐
180- ment of execve(), starting at argv[1].
181-
182- For portable use, optional-arg should either be absent, or be
183- specified as a single word (i.e., it should not contain white
184- space); see NOTES below.
185-
186- Since Linux 2.6.28, the kernel permits the interpreter of a script
187- to itself be a script. This permission is recursive, up to a
188- limit of four recursions, so that the interpreter may be a script
189- which is interpreted by a script, and so on.
--
343: Interpreter scripts
344- A maximum line length of 127 characters is allowed for the first
345- line in an interpreter scripts.
346-
347- The semantics of the optional-arg argument of an interpreter
348- script vary across implementations. On Linux, the entire string
349- following the interpreter name is passed as a single argument to
350- the interpreter, and this string can include white space. How‐
351- ever, behavior differs on some other systems. Some systems use
352- the first white space to terminate optional-arg. On some systems,
353- an interpreter script can have multiple arguments, and white spa‐
354- ces in optional-arg are used to delimit the arguments.
355-
356- Linux ignores the set-user-ID and set-group-ID bits on scripts.
From Solaris 11 exec(2) man page:
An interpreter file begins with a line of the form
#! pathname [arg]
where pathname is the path of the interpreter, and arg is an
optional argument. When an interpreter file is executed, the
system invokes the specified interpreter. The pathname
specified in the interpreter file is passed as arg0 to the
interpreter. If arg was specified in the interpreter file,
it is passed as arg1 to the interpreter. The remaining
arguments to the interpreter are arg0 through argn of the
originally exec'd file. The interpreter named by pathname
must not be an interpreter file.
As stated by the last statement, chaining interpreters is not supported at all in Solaris, trying to do that will result in the last non-interpreted interpreter (such as /usr/bin/python3) to interpret the first script (such as /tmp/script.xyz, the final command line would become /usr/bin/python3 /tmp/script.xyz), without chaining.
So doing script interpreter chaining is not portable at all.
For every recipe in target, make invokes a different subshell for recipe execution .
So , I tried this command strace make all 2>&1 | grep fork,
but did not get any matches for the fork system call. Where I am wrong ?
You don't see calls to fork because the actual system call is clone. You'll see this if you inspect the output of strace. I like to save the strace output in a file and look at it afterwards:
strace -o trace make all
If I have a Makefile that looks like this:
three: one two
cat one two > three
one:
date > one
two:
date > two
Then after running the strace command above, I have:
$ grep clone trace
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a3570ce50) = 29836
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a3570ce50) = 29838
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a3570ce50) = 29840
From the fork man page:
C library/kernel differences
Since version 2.3.3, rather than invoking the kernel's fork() system
call, the glibc fork() wrapper that is provided as part of the NPTL
threading implementation invokes clone(2) with flags that provide the
same effect as the traditional system call. (A call to fork() is
equivalent to a call to clone(2) specifying flags as just SIGCHLD.)
The glibc wrapper invokes any fork handlers that have been
established using pthread_atfork(3).
I'm reading Linux Kernel Development by Robert Love and one of the exercises he does is to create a system call (page 106). The problem is that I am unable to find the system call table file in v3.9 for the x86_32 architecture. I know that he's using the version 2.6.xx but I don't know if that version will work with the distribution that I'm using as it is pretty old so I would rather prefer v3.9.
More information:
The exercise of which I am speaking is the following:
Add an entry to the end of the system call table.This needs to be done for each architecture that supports the system call (which, for most calls, is all the architectures).The position of the syscall in the table, starting at zero, is its system call number. For example, the tenth entry in the list is assigned syscall number nine.
Solved using the following approach:
The system call table is located in arch/x86/syscalls/syscall_32.tbl for the x86 architecture. Thanks to Sudip Mukherjee for his help.
Another approach is the following:
http://lists.kernelnewbies.org/pipermail/kernelnewbies/2013-July/008598.html
Thanks to Srinivas Ganji for his help too.
From linux kernel 4.2, the system call table has been moved from arch/x86/syscalls/syscall_64.tbl to arch/x86/entry/syscalls/syscall_64.tbl
Here is the corresponding commit:
commit 1f57d5d85ba7f1f467173ff33f51d01a91f9aaf1
Author: Ingo Molnar <mingo#kernel.org>
Date: Wed Jun 3 18:36:41 2015 +0200
x86/asm/entry: Move the arch/x86/syscalls/ definitions to arch/x86/entry/syscalls/
The build time generated syscall definitions are entry code related, move
them into the arch/x86/entry/ directory.
Create a testing folder in src root: src/linux-3.4/testing/, then put inside this folder:
- a file that contains syscall code: strcpy.c
#include <linux/linkage.h>
#include <linux/kernel.h>
asmlinkage long sys_strcpy(char *dest, char *src)
{
int i=0;
while(src[i]!='\0') {
dest[i]=src[i++];
}
dest[i]='\0';
printk(" Done it ");
return 0;
}
and the Makefile that contains just the following line:
obj-y:=strcpy.o
Add an entry to the syscall table and the prototype of the function:
- edit the file src/linux-3.4/arch/x86/syscalls/syscall_32.tbl by adding this line to the entry 223 that is free
223 i386 strcpy sys_strcpy
Edit the file src/linux-3.4/include/linux/syscalls.h by adding the prototype of the function
asmlinkage long sys_strcpy(char *dest, char *src);
Edit the main Makefile in the src root (src/linux-3.4/Makefile) by adding the testing folder created before, as follow:
core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ testing/
For systems where audit is enabled, a table of the syscalls may be easily retrieved with:
ausyscall --dump
For example:
$ ausyscall --dump
Using x86_64 syscall table:
0 read
1 write
2 open
3 close
4 stat
5 fstat
6 lstat
7 poll
8 lseek
9 mmap
10 mprotect
...SNIP...
A similar question on SO where the OP seems to have solved it:
New syscall not found (linux kernel 3.0.0) where should I start looking?
The file seems to be arch/x86/kernel/syscall_table_32.c.
How can I check the umask of a program which is currently running?
[update: another process, not the current process.]
You can attach gdb to a running process and then call umask in the debugger:
(gdb) attach <your pid>
...
(gdb) call umask(0)
[Switching to Thread -1217489200 (LWP 11037)]
$1 = 18 # this is the umask
(gdb) call umask(18) # reset umask
$2 = 0
(gdb)
(note: 18 corresponds to a umask of O22 in this example)
This suggests that there may be a really ugly way to get the umask using ptrace.
Beginning with Linux kernel 4.7, the umask is available in /proc/<pid>/status.
From the GNU C Library manual:
Here is an example showing how to read the mask with umask
without changing it permanently:
mode_t
read_umask (void)
{
mode_t mask = umask (0);
umask (mask);
return mask;
}
However, it is better to use getumask if you just want to read
the mask value, because it is reentrant (at least if you use the
GNU operating system).
getumask is glibc-specific, though. So if you value portability, then the non-reentrant solution is the only one there is.
Edit: I've just grepped for ->umask all through the Linux source code. There is nowhere that will get you the umask of a different process. Also, there is no getumask; apparently that's a Hurd-only thing.
If you're the current process, you can write a file to /tmp and check its setting. A better solution is to call umask(3) passing zero - the function returns the setting prior to the call - and then reset it back by passing that value back into umask.
The umask for another process doesn't seem to be exposed.
A colleague just showed me this command line pattern for this. I always have emacs running, so that's in the example below. The perl is my contribution:
sudo gdb --pid=$(pgrep emacs) --batch -ex 'call/o umask(0)' -ex 'call umask($1)' 2> /dev/null | perl -ne 'print("$1\n")if(/^\$1 = (\d+)$/)'
On Fedora Core 7, I'm writing some code that relies on ARG_MAX. However, even if I #include <limits.h>, the constant is still not defined. My investigations show that it's present in <sys/linux/limits.h>, but this is supposed to be portable across Win32/Mac/Linux, so directly including it isn't an option. What's going on here?
The reason it's not in limits.h is that it's not a quantity giving the limits of the value range of an integral type based on bit width on the current architecture. That's the role assigned to limits.h by the ISO standard.
The value in which you're interested is not hardware-bound in practice and can vary from platform to platform and perhaps system build to system build.
The correct thing to do is to call sysconf and ask it for "ARG_MAX" or "_POSIX_ARG_MAX". I think that's the POSIX-compliant solution anyway.
Acc. to my documentation, you include one or both of unistd.h or limits.h based on what values you're requesting.
One other point: many implementations of the exec family of functions return E2BIG or a similar value if you try to call them with an oversized environment. This is one of the defined conditions under which exec can actually return.
For the edification of future people like myself who find themselves here after a web search for "arg_max posix", here is a demonstration of the POSIXly-correct method for discerning ARG_MAX on your system that Thomas Kammeyer refers to in his answer:
cc -x c <(echo '
#include <unistd.h>
#include <stdio.h>
int main() { printf("%li\n", sysconf(_SC_ARG_MAX)); }
')
This uses the process substitution feature of Bash; put the same lines in a file and run cc thefile.c if you are using some other shell.
Here's the output for macOS 10.14:
$ ./a.out
262144
Here's the output for a RHEL 7.x system configured for use in an HPC environment:
$ ./a.out
4611686018427387903
$ ./a.out | numfmt --to=iec-i # 'numfmt' from GNU coreutils
4.0Ei
For contrast, here is the method prescribed by https://porkmail.org/era/unix/arg-max.html, which uses the C preprocessor:
cpp <<HERE | tail -1
#include <limits.h>
ARG_MAX
HERE
This does not work on Linux for reasons still not entirely clear to me—I am not a systems programmer and not conversant in the POSIX or ISO specs—but probably explained above.
ARG_MAX is defined in /usr/include/linux/limits.h. My linux kernel version is 3.2.0-38.