Cannot open uid_map for writing from an app with cap_setuid capability set - linux

While toying around with an example from user_namespaces(7), I've come across a strange behaviour.
What the application does
The application user-ns-ex calls clone(2) with CLONE_NEWUSER, thus creating a new process in a new user namespace. The parent process writes a map (0 1000 1) to /proc//uid_map file and tells (via a pipe) the child that it can proceed. The child process then execs bash.
I've copied the source code here.
The problem
The application opens /proc//uid_map for writing if I either set it no capabilites or all of them.
When I set only set_capuid,set_capgid and optionally cap_sys_admin the call to open(2) fails:
Set caps:
arksnote linux-namespaces # setcap 'cap_setuid,cap_setgid,cap_sys_admin=epi' ./user-ns-ex
arksnote linux-namespaces # getcap ./user-ns-ex
./user-ns-ex = cap_setgid,cap_setuid,cap_sys_admin+eip
Try to run:
kamyshev#arksnote ~/workspace/personal/linux-kernel/linux-namespaces $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19666
ERROR: open /proc/19666/uid_map: Permission denied
About to exec bash
And now a successfull case:
No capabilities:
arksnote linux-namespaces # setcap '=' ./user-ns-ex
arksnote linux-namespaces # getcap ./user-ns-ex
./user-ns-ex =
Runs Ok:
kamyshev#arksnote ~/workspace/personal/linux-kernel/linux-namespaces $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19557
About to exec bash
arksnote linux-namespaces # exit
I've been trying to find the reason in man-pages and playing with different capabilities but with no luck as of this moment. What puzzles me the most, is that the application runs with less capabilities and does not with more.
Can someone help me and clarify the issue?

The research
I have found the reason. During my reasearch I have found that uid_map file is not open because its ownership is changed to root.
Unprivileged process, no capabilities:
parent(m): capabilities: '='
parent(m): file /proc/4644/uid_map owner uid: 1000
parent(m): file /proc/4644/uid_map owner gid: 1000
Unprivileged process, capabilities are set (cap_setuid=pe):
parent(m): capabilities: '= cap_setuid+ep'
parent(m): file /proc/4644/uid_map owner uid: 0
parent(m): file /proc/4644/uid_map owner gid: 0
ERROR: open /proc/4668/uid_map: Permission denied
The following research has led me to this topic: what causes proc pid resources to become owned by root?
The rules on "dumpable" flag
This is what happens:
1) When a process is not dumpable, its /proc/<pid> inodes are given a root ownership:
// linux/base.c
struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *task)
...
if (task_dumpable(task)) {
rcu_read_lock();
cred = __task_cred(task);
inode->i_uid = cred->euid;
inode->i_gid = cred->egid;
rcu_read_unlock();
}
2) The process is dumpable only when its "dumpable" attribute has a value 1 (SUID_DUMP_USER). See ptrace(2).
3) prctl(2) clears the situation further:
Normally, this flag is set to 1. However, it is reset to the
current value contained in the file /proc/sys/fs/suid_dumpable
(which by default has the value 0), in the following
circumstances:
* The process's effective user or group ID is changed.
* The process's filesystem user or group ID is changed (see
credentials(7)).
* The process executes (execve(2)) a set-user-ID or set-
group-ID program, resulting in a change of either the
effective user ID or the effective group ID.
* The process executes (execve(2)) a program that has file
capabilities (see capabilities(7)), but only if the
permitted capabilities gained exceed those already
permitted for the process.
Thus my problem arose from the last of the above rules:
int commit_creds(struct cred *new)
<...>
/* dumpability changes */
if (!uid_eq(old->euid, new->euid) ||
!gid_eq(old->egid, new->egid) ||
!uid_eq(old->fsuid, new->fsuid) ||
!gid_eq(old->fsgid, new->fsgid) ||
!cred_cap_issubset(old, new)) {
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
Fixes
There are a number of ways to overcome the issue:
Globally change /proc/sys/fs/suid_dumpable:
echo 1 > /proc/sys/fs/suid_dumpable
Set "dumpable" flag just for the process:
prctl(PR_SET_DUMPABLE, 1, 0, 0, 0)

Related

Umount("/proc") syscall for mount namespaces "Invalid Argument" error

i'm currently trying to use different namespaces for test purposes. For this i tried to implement a MNT namespace (combined with a PID namespace) so that a program within this namespace cannot see other processes on the system.
When trying to use the umount system call like this (same goes with umount("/proc"), or with umount2 and the Force-option ):
if (umount2("/proc", 0)!= 0)
{
fprintf(stderr, "Error when unmounting /proc: %s\n",strerror(errno));
printf("\tKernel version might be incorrect\n");
exit(-1);
}
the system call execution ends with error number 22 "Invalid Argument".
This code snipped is called within a function that gets called when a child process with the namespaces is created:
pid_t child_pid = clone(child_exec, child_stack+1024*1024, Child_Flags,&args);
(the child_exec function). Flags are set as following:
int Child_Flags = CLONE_NEWIPC | CLONE_NEWUSER | CLONE_NEWUTS | CLONE_NEWNET |CLONE_NEWPID | CLONE_NEWNS |SIGCHLD ;
With the CLONE_NEWNS for a new mount namespace (http://man7.org/linux/man-pages/man7/namespaces.7.html)
Output of the program is as follows:
Testing with Isolation
Starting Container engine
In-Child-PID: 1
Error number 22
Error when unmounting /proc: Invalid argument
Can somebody point me to my error, so i can unmount the folder? Thank you in advance
You can't unmount things that were mounted in a different user namespace except by using pivot_root followed by umount to unmount /. You can overmount /proc without unmounting the old /proc.

"Permission denied" reading from a process-substitution FIFO in an unprivileged child process

Consider the following, observed with bash 4.4 on a Linux 3.19 kernel:
# in reality, this may access files "nobody" isn't allowed
get_a_secret() { printf '%s\n' "This is a secret"; }
# attach a process substitution reading the secret to FD 10
exec 10< <(get_a_secret)
# run a less-privileged program that needs the secret, passing it the file descriptor.
chpst -u nobody:nobody -- cat /dev/fd/10
...or the shorter/simpler:
chpst -u nobody:nobody -- cat <(get_a_secret)
Either fails in a manner akin to the following:
cat: /proc/self/fd/10: Permission denied
So, two branches to this question:
What's going on here?
Is there a way to get the desired behavior (passing the ability to read the secret through to the single child process being invoked in a way that doesn't persistently expose that secret to other processes running as "nobody") without exposing the FIFO's output to other processes?
(Yes, I'm well aware that I need to lock down ptrace and /proc/*/mem to prevent another process running as "nobody" from pulling the secret out of the client as it's being read; that said, that's (1) something I can do, and (2) when the process is only run before any potentially-attacker-controlled executables are invoked, less exposure than allowing any process running as nobody to pull the secret out of /proc/*/environ for the full duration of that process).
The following workaround avoids this issue:
exec 10< <(get_a_secret)
chpst -u nobody:nobody -- sh -c 'cat <&10'
Note the redirection being written as <&10 -- not </dev/fd/10 or </proc/self/fd/10 (on platforms which provide /dev/fd -- on platforms without this facility, bash rewrites it into a fdup2() call).
An answer with an explanation of the behavior (and perhaps a workaround that allows programs that don't accept a FD number as input to act on the read side?) would be in a position to supercede this one. :)

Cannot strace sudo; reports that effective uid is nonzero

command:
bigxu#bigxu-ThinkPad-T410 ~/work/lean $ sudo ls
content_shell.pak leanote libgcrypt.so.11 libnotify.so.4 __MACOSX resources
icudtl.dat leanote.png libnode.so locales natives_blob.bin snapshot_blob.bin
most time it is right.but sometimes it is very slow.
so i strace it.
command:
bigxu#bigxu-ThinkPad-T410 ~/work/lean $ strace sudo ls
execve("/usr/bin/sudo", ["sudo", "ls"], [/* 66 vars */]) = 0
brk(0) = 0x7f2b3c423000
fcntl(0, F_GETFD) = 0
fcntl(1, F_GETFD) = 0
fcntl(2, F_GETFD) = 0
......
......
......
write(2, "sudo: effective uid is not 0, is"..., 140sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
) = 140
exit_group(1) = ?
+++ exited with 1 +++
other information:
bigxu-ThinkPad-T410 lean # ls /etc/sudoers -alht
-r--r----- 1 root root 745 2月 11 2014 /etc/sudoers
bigxu-ThinkPad-T410 lean # ls /usr/bin/sudo -alht
-rwsr-xr-x 1 root root 152K 12月 14 21:13 /usr/bin/sudo
bigxu-ThinkPad-T410 lean # df `which sudo`
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 67153528 7502092 56217148 12%
For security reasons, the setuid bit and ptrace (used to run binaries under a debugger) cannot both be honored at the same time. Failure to enforce this restriction in the past led to CVE-2001-1384.
Consequently, any operating system designed with an eye to security will either stop honoring ptrace on exec of a setuid binary, or fail to honor the setuid bit when ptrace is in use.
On Linux, consider using Sysdig instead -- which, being able to only view but not modify behavior, does not run the same risks.
How to trace sudo
$ sudo strace -u <username> sudo -k <command>
sudo runs strace as root.
strace runs sudo as <username> passed via the -u option.
sudo drops cached credentials from the previous sudo with -k option (for asking the password again) and runs <command>.
The second sudo is the tracee (the process being traced).
For automatically putting the current user in the place of <username>, use $(id -u -n).
Why sudo does not work with strace
In addition to this answer by Charles, here is what execve() manual page says:
If the set-user-ID bit is set on the program file referred to by pathname, then the effective user ID of the calling process is changed to that of the owner of the program file. Similarly, when the set-group-ID bit of the program file is set the effective group ID of the calling process is set to the group of the program file.
The aforementioned transformations of the effective IDs are not performed (i.e., the set-user-ID and set-group-ID bits are ignored) if any of the following is true:
the no_new_privs attribute is set for the calling thread (see prctl(2));
the underlying filesystem is mounted nosuid (the MS_NOSUID flag for mount(2)); or
the calling process is being ptraced.
The capabilities of the program file (see capabilities(7)) are also ignored if any of the above are true.
The permissions for tracing a process, inspecting or modifying its memory, are described in subsection Ptrace access mode checking in section NOTES of ptrace(2) manual page. I've commented about this in this answer.

setuid on an executable doesn't seem to work

I wrote a small C utility called killSPR to kill the following processes on my RHEL box. The idea is for anyone who logs into this linux box to be able to use this utility to kill the below mentioned processes (which doesn't work - explained below).
cadmn#rhel /tmp > ps -eaf | grep -v grep | grep " SPR "
cadmn 5822 5821 99 17:19 ? 00:33:13 SPR 4 cadmn
cadmn 10466 10465 99 17:25 ? 00:26:34 SPR 4 cadmn
cadmn 13431 13430 99 17:32 ? 00:19:55 SPR 4 cadmn
cadmn 17320 17319 99 17:39 ? 00:13:04 SPR 4 cadmn
cadmn 20589 20588 99 16:50 ? 01:01:30 SPR 4 cadmn
cadmn 22084 22083 99 17:45 ? 00:06:34 SPR 4 cadmn
cadmn#rhel /tmp >
This utility is owned by the user cadmn (under which these processes run) and has the setuid flag set on it (shown below).
cadmn#rhel /tmp > ls -l killSPR
-rwsr-xr-x 1 cadmn cusers 9925 Dec 17 17:51 killSPR
cadmn#rhel /tmp >
The C code is given below:
/*
* Program Name: killSPR.c
* Description: A simple program that kills all SPR processes that
* run as user cadmn
*/
#include <stdio.h>
int main()
{
char *input;
printf("Before you proceed, find out under which ID I'm running. Hit enter when you are done...");
fgets(input, 2, stdin);
const char *killCmd = "kill -9 $(ps -eaf | grep -v grep | grep \" SPR \" | awk '{print $2}')";
system(killCmd);
return 0;
}
A user (pmn) different from cadmn tries to kill the above-mentioned processes with this utility and fails (shown below):
pmn#rhel /tmp > ./killSPR
Before you proceed, find out under which ID I'm running. Hit enter when you are done...
sh: line 0: kill: (5822) - Operation not permitted
sh: line 0: kill: (10466) - Operation not permitted
sh: line 0: kill: (13431) - Operation not permitted
sh: line 0: kill: (17320) - Operation not permitted
sh: line 0: kill: (20589) - Operation not permitted
sh: line 0: kill: (22084) - Operation not permitted
pmn#rhel /tmp >
While the user waits to hit enter above, the process killSPR is inspected and is seen to be running as the user cadmn (shown below) despite which killSPR is unable to terminate the processes.
cadmn#rhel /tmp > ps -eaf | grep -v grep | grep killSPR
cadmn 24851 22918 0 17:51 pts/36 00:00:00 ./killSPR
cadmn#rhel /tmp >
BTW, none of the main partitions have any nosuid on them
pmn#rhel /tmp > mount | grep nosuid
pmn#rhel /tmp >
The setuid flag on the executable doesn't seem to have the desired effect. What am I missing here? Have I misunderstood how setuid works?
First and foremost, setuid bit simply allows a script to set the uid. The script still needs to call setuid() or setreuid() to run in the the real uid or effective uid respectively. Without calling setuid() or setreuid(), the script will still run as the user who invoked the script.
Avoid system and exec as they drop privileges for security reason. You can use kill() to kill the processes.
Check These out.
http://linux.die.net/man/2/setuid
http://man7.org/linux/man-pages/man2/setreuid.2.html
http://man7.org/linux/man-pages/man2/kill.2.html
You should replace your system call with exec call. Manual for system say's it drops privileges when run from suid program.
The reason is explained in man system:
Do not use system() from a program with set-user-ID or set-group-ID
privileges, because strange values for some environment variables might
be used to subvert system integrity. Use the exec(3) family of func‐
tions instead, but not execlp(3) or execvp(3). system() will not, in
fact, work properly from programs with set-user-ID or set-group-ID
privileges on systems on which /bin/sh is bash version 2, since bash 2
drops privileges on startup. (Debian uses a modified bash which does
not do this when invoked as sh.)
If you replace system with exec you will need to be able to use shell syntax unless you call /bin/sh -c <shell command>, this is what is system actually doing.
Check out this link on making a shell script a daemon:
Best way to make a shell script daemon?
You might also want to google some 'linux script to service', I found a couple of links on this subject.
The idea is that you wrap a shell script that has some basic stuff in it that allows a user to control a program run as another user by calling a 'service' type script instead. For example, you could wrap up /usr/var/myservice/SPRkiller as a 'service' script that could then just be called as such from any user: service SPRkiller start, then SPRkiller would run, kill the appropriate services (assuming the SPR 'program' is run as a non-root user).
This is what it sounds like you are trying to achieve. Running a program (shell script/C program/whatever) carries the same user restrictions on it no matter what (except for escalation bugs/hacks).
On a side note, you seem to have a slight misunderstanding of user rights on Linux/Unix as well as what certain commands and functions do. If a user does not have permissions to do a certain action (like kill the process of another user), then calling setuid on the program you want to kill (or on kill itself) will have no effect because the user does not have permission to another users 'space' without super user rights. So even if you're in a shell script or a C program and called the same system command, you will get the same effect.
http://www.linux.com/learn/ is a great resource, and here's a link for file permissions
hope that helps

Can dmidecode command be invoked successfully by SUID program in Linux?

OS is Linux SuSE 2.6.16.60-0.21-smp
I have one executable bin file (name is bmu) which has been configured SUID, as shown in below
-rwsr-sr-x 1 root root 14968899 2012-03-29 10:35 bmu
And this program invoke dmidecode inside.
Operation will be ok if it was run by root, but dmidecode invoked will return null if the program was run by non-root user.
What is the reason of this problem and how to fix it?
Edit: Added code and description from comment:
read_fp = popen("dmidecode | grep 'Product Name'", "r");
/* ...... */
chars_read = fread(buffer, sizeof(char), BUFSIZ-1, read_fp);
Return of read_fp is not null, but length of buffer is 0 which there should be some value.
The problem was solved in an unsafe way.
It is not enough to add SUID in the program bmu, dmidecode should also be.
-rwsr-sr-x 1 root root 59504 2006-06-16 22:08 /usr/sbin/dmidecode
The dmidecode program needs access to /dev/mem which ordinary users doesn't have permission for. The most common fix to such a problem is either to do as you already do and make the program SUID, or to add the user to the kmem group (the group owning /dev/mem).

Resources