How does chroot-escape protection in LXC implemented - security

How does chroot-escape protection in LXC implemented? Is there guarantee, that there no way to escape from lxc container to host?
I know, that linux-vserver uses chroot-barrier for that, but it doesn't part of stock kernel, afair.

Did you see the info contained in the "Applying mount namespaces" article: http://www.ibm.com/developerworks/library/l-mount-namespaces/
Under the "Per-user root" section:
"One shortcoming of this approach is that an ordinary chroot() can be escaped, although some privilege is needed. For instance, when executed with certain capabilities including CAP_SYS_CHROOT, the source for a program to break out of chroot() (see Resources) will cause the program to escape into the real filesystem root. Depending on the actual motivation for and use of the per-user filesystem trees, this may be a problem.
We can address this problem by using pivot_root(2) in a private namespace instead of chroot(2) to change the login's root to /share/USER/root. Whereas chroot() simply points the process's filesystem root to a specified new directory, pivot_root() detaches the specified new_root directory (which must be a mount) from its mount point and attaches it to the process root directory. Since the mount tree has no parent for the new root, the system cannot be tricked into entering it like it can with chroot(). We will use the pivot_root() approach."
In short I've seen pivot_root used in combination with the mnt namespace to mitigate such concerns.

Related

How to restrict anyone (root user also) to modify file in Linux

I am developing one security base software for Linux platform using C and CPP. I want to restrict all users (even root user) also to modify the file. i.e No one can modify the file.
Modifying means no-one can write to the file, move the file or remove the file etc.
More precisely:
I have a file named as a.txt in directory /home/ and I want to do something to this file so that no one can write into this file, remove this file or move this file.
But can read the file.
I tried chattr command:
chattr +i /home/a.txt
It solved my problem for other users but when I switched to superuser i.e root user into terminal and I fired command:
chattr -i /home/a.txt
So /home/a.txt file become mutable. root user can change file immutable to mutable. So the problem is not solved for root user.
I want to do something to this file, so even root user can't modify this file.
I already ask this question at Unix&Linux and askUbuntu and did not get any answer.
In general, the definition of the root user in Unix systems is that no permission checking is done by the kernel when the user is root (with some insignificant exceptions).
So, if you want to prevent root from doing something, you have to write a kernel module that does that. Indeed, most security-related software has kernel components. And this will be harder than you think - root can basically unmount the filesystem and mount it on another machine, or boot with a kernel that doesn't include your module.
There are already a few security-related kernel modules that you can look into: SELinux, AppArmor etc. (also Tomoyo and Smack, but they don't seem helpful in this case). Depending on your requirement, they might be sufficient.
Put your app in an appliance with disk encryption.

Fuse symbolic link resolution under chroot

I am creating a fuse-based filesystem very similar to the example passthrough_fh. Where I log some statistics in my handlers before calling the underlying system call.
I use this with a debian Wheezy chroot image from debboostrap. The idea is to mirror wheezy/ into my mountpoint, then a process will chroot into the mountpoint and all activities will be recorded through my fuse fs.
The OS seems to handle path resolution with chroot nicely. That is, if the chrooted process does stat("/bin/ls"), from my fuse process I see stat("wheezy/bin/ls").
However I'm not sure how to handle symlinks. For example the file
wheezy/lib64/ld-linux-x86-64.so.2
points to
/lib/x86_64-linux-gnu/ld-2.13.so
So when I call stat("wheezy/lib64/ld-linux-x86-64.so.2") it won't just work, since the OS will try to dereference the symlink /lib/x86_64-linux-gnu/ld-2.13.so instead of the correct wheezy/lib/x86_64-linux-gnu/ld-2.13.so.
This is a simplified example, I can't just prepend wheezy/ to all paths, I want to also support applications which do not chroot, or chroot multiple times.
I can think of some less than ideals ways to do this, e.g. check /proc/pid/root/ to get the root of the process in case of chroot, but then I have to always check if a file is a symbolic link.
Is there a better way or general way fuse based file systems handle this problem?
After contacting the fuse-devel mailing list, I received the following response:
If you are performing this stat(2) for GETATTR or LOOKUP, you should
be using lstat(2) instead. This will tell the kernel that you found a
symlink and it should keep managing path resolution correctly for you.
That is, use lstat(2) when handling LOOKUP or GETATTR, use the results of lstat to fill the fuse struct. From there, the kernel will automatically handle the name resolution (even for symbolic links, and processes running inside a chroot).

Linux - use of term mount in clone man page

I asked a question to clarify what mount means in Linux.
I have a doubt in the use of this term in the clone man page:
The namespace of a process is the data (the set of mounts) describing the file
hierarchy as seen by that process.
The set of mounts - describing the file hierarchy seems misleading to me.
From what I understood based on the accepted answer, the file hierarchy could be much more than just the set of mounts, as the set of mounts will be just be the mount points where file systems were added to existing file system.
Can anyone clarify ?
If you think of “the set of mounts” as being (at least) a set of (device, mount point) pairs, rather than merely a set of mount points, then it starts to look a lot like the fstab or the output of the mount command (with no arguments), albeit without the additional information about flags and options (e.g. rw, nosuid, etc.).
Such a “set of mounts” provides complete information about what filesystems are mounted where. This is, by definition, the “mount namespace” of a process. Once you go from the traditional situation of having one global mount namespace to having per-process mount namespaces, additional questions arise when a process fork()s.
Traditionally, mounting or unmounting a filesystem changed the filesystem as seen by all processes.
With per-process mount namespaces, it is possible for a child process to have a different mount namespace from its parent. A question now arises:
Should changes to the mount namespace made by the child propagate back to the parent?
Clearly, this functionality must at least be supported and, indeed, must probably be the default. Otherwise, launching the mount command itself would effect no change (since the filesystem as seen by the parent shell would be unaffected).
Equally clearly, it must also be possible for this necessary propagation to be suppressed, otherwise we can never create a child process whose mount namespace differs from its parent, and we have one global mount namespace again (the filesystem as seen by init).
Thus, we must decide on fork() whether the child process gets its own copy of the data about mounted filesystems from the parent, which it can change without affecting the parent, or gets a pointer to the same data structures as the, which it can change (necessary for changes to propagate back, as when you launch mount from the shell).
If the CLONE_NEWNS flag is passed to clone() or fork(), the child gets a copy of its parent's mounted filesystem data, which it can change without affecting the parent's mount namespace. Otherwise, it gets a pointer to the parents data structure, where changes made by the child will be seen by the parent (so the mount command itself can work).

Regarding Hard Link

Can somebody please explain me why the kernel doesn't allow us to make a hard link to a directory. Whether it is because it breaks the rule of directed acyclic graph structure of the file-system or it is because of some other reason. What other complications come if it allows that?
Back in the days of 7th Edition (or Version 7) UNIX, there were no system calls mkdir(2) and rmdir(2). The mkdir(1) program was SUID root, and used the mknod(2) system call to create the directory and the link(2) system call to make the entries for . and .. in the new directory. The link(2) system call only allowed root to do that. Consequently, way back then (circa 1978), it was possible for the superuser to create links to directories, but only the superuser was permitted to do so to ensure that there were no problems with cycles or other missing links. There were diagnostic programs to pick up the pieces if the system crashed while a directory was partly created, for example.
You can find the Unix 7th Edition manuals at Bell Labs. Sections 2 and 3 are devoid of mkdir(2) and rmdir(2). You used the mknod(2) system call to make the directory:
NAME
mknod – make a directory or a special file
SYNOPSIS
mknod(name, mode, addr)
char *name;
DESCRIPTION
Mknod creates a new file whose name is the null-terminated string pointed to by name. The mode of
the new file (including directory and special file bits) is initialized from mode. (The protection part of
the mode is modified by the process’s mode mask; see umask(2)). The first block pointer of the i-node
is initialized from addr. For ordinary files and directories addr is normally zero. In the case of a special
file, addr specifies which special file.
Mknod may be invoked only by the super-user.
SEE ALSO
mkdir(1), mknod(1), filsys(5)
DIAGNOSTICS
Zero is returned if the file has been made; – 1 if the file already exists or if the user is not the superuser.
The entry for link(2) states:
DIAGNOSTICS
Zero is returned when a link is made; – 1 is returned when name1 cannot be found; when name2 already
exists; when the directory of name2 cannot be written; when an attempt is made to link to a directory by
a user other than the super-user; when an attempt is made to link to a file on another file system; when a
file has too many links.
The entry for unlink(2) states:
DIAGNOSTICS
Zero is normally returned; – 1 indicates that the file does not exist, that its directory cannot be written,
or that the file contains pure procedure text that is currently in use. Write permission is not required on
the file itself. It is also illegal to unlink a directory (except for the super-user).
The manual page for the ln(1) command noted:
It is forbidden to link to a directory or to link across file systems.
The manual page for the mkdir(1) command notes:
Standard entries, '.', for the directory itself, and '..'
for its parent, are made automatically.
This would not be worthy of comment were it not that it was possible to create directories without those links.
Nowadays, the mkdir(2) and rmdir(2) system calls are standard and permit any user to create and remove directories, preserving the correct semantics. There is no longer a need to permit users to create hard links to directories. This is doubly true since symbolic links were introduced - they were not in 7th Edition UNIX, but were in the BSD versions of UNIX from quite early on.
With normal directories, the .. entry unambiguously links back to the (single, solitary) parent directory. If you have two hard links (two names) for the same directory in different directories, where does the .. entry point? Presumably, to the original parent directory - and presumably there is no way to get to the 'other' parent directory from the linked directory. That's an asymmetry that can cause trouble. Normally, if you do:
chdir("./subdir");
chdir("..");
(where ./subdir is not a symbolic link), then you will be back in the directory you started from. If ./subdir is a hard link to a directory somewhere else, then you will be in a different directory from where you started after the second chdir(). You'd have to show that with a pair of stat() calls before and after the chdir() operations shown.
This is entirely because allowing hard links to directories allows for potential loops and cycles in the directory graph without adding much value.
In addition to the possibility of getting cycles (much like with symlinks, by the way, but these are easier to detect and handle), there is a second reason I can think of.
On UNIX, there is a common assumption in use by many programs, that will assume that all directories will have a link count of 2+number of child directories. This is due to the POSIX standard directory entries '.' and '..' which link to the directory or it's parent.
(After verification, I can say that the root (/) is not an exception).
This is especially useful as a performance optimization to detect leaf directories when recursing, but many applications will exist that have found other uses for it
Clarifying
By allowing 'userdefined' hardlinks to directories, these invariants so to say will no longer hold, and any dependent applications might stop working correctly.
The element of surprise is why you need root permissions (and some good design (re)thinking) in order to force creation of directory hardlinks
Because then the directory tree will cease to be a directory tree. One directory could have multiple parents.
Cyclic references will break garbage collection by reference counting. Wikipedia describes the problem:
There are a variety of ways of handling the problem of detecting and collecting reference cycles. One is that a system may explicitly forbid reference cycles.
That it the way Linux does it.

Secure access to files in a directory identified by an environment variable?

Can anyone point to some code that deals with the security of files access via a path specified (in part) by an environment variable, specifically for Unix and its variants, but Windows solutions are also of interest?
This is a big long question - I'm not sure how well it fits the SO paradigm.
Consider this scenario:
Background:
Software package PQR can be installed in a location chosen by users.
The environment variable $PQRHOME is used to identify the install directory.
By default, all programs and files under $PQRHOME belong to a special group, pqrgrp.
Similarly, all programs and files under $PQRHOME either belong to a special user, pqrusr, or to user root (and those are SUID root programs).
A few programs are SUID pqrusr; a few more programs are SGID pqrgrp.
Most directories are owned by pqrusr and belong to pqrgrp; some can belong to other groups, and the members of those groups acquire extra privileges with the software.
Many of the privileged executables must be run by people who are not members of pqrgrp; the programs have to validate that the user is permitted to run it by arcane rules that do not directly concern this question.
After startup, some of the privileged programs have to retain their elevated privileges because they are long-running daemons that may act on behalf of many users over their lifetime.
The programs are not authorized to change directory to $PQRHOME for a variety of arcane reasons.
Current checking:
The programs currently check that $PQRHOME and key directories under it are 'safe' (owned by pqrusr, belong to pqrgrp, do not have public write access).
Thereafter, programs access files under $PQRHOME via the full value of environment variable.
In particular, the G11N and L10N is achieved by accessing files in 'safe' directories, and reading format strings for printf() etc out of the files in those directories, using the full pathname derived from $PQRHOME plus a known sub-structure (for example, $PQRHOME/g11n/en_us/messages.l10n).
Assume that the 'as installed' value of $PQRHOME is /opt/pqr.
Known attack:
Attacker sets PQRHOME=/home/attacker/pqr.
This is actually a symlink to /opt/pqr, so when one of the PQR programs, call it pqr-victim, checks the directory, it has correct permissions.
Immediately after the security checking is completed successfully, the attacker changes the symlink so that it points to /home/attacker/bogus-pqr, which is clearly under the attacker's control.
Dire things happen when the pqr-victim now accesses a file under the supposedly safe directory.
Given that PQR currently behaves as described, and is a large package (multiple millions of lines of code, developed over more than a decade to a variety of coding standards, which were frequently ignored, anyway), what techniques would you use to remediate the problem?
Known options include:
Change all formatting calls to use function that checks actual arguments against the format strings, with an extra argument indicating the actual types passed to the function. (This is tricky, and potentially error prone because of the sheer number of format operations to be changed - but if the checking function is itself sound, works well.)
Establish the direct path to PQRHOME and validate it for security (details below), refusing to start if it is not secure, and thereafter using the direct path and not the value of $PQRHOME (when they differ). (This requires all file operations that use $PQRHOME to use not the value from getenv() but the mapped path. For example, this would require the software to establish that /home/attacker/pqr is a symlink to /opt/pqr, that the path to /opt/pqr is secure, and thereafter, whenever a file is referenced as $PQRHOME/some/thing, the name used would be /opt/pqr/some/thing and not /home/attacker/pqr/some/thing. This is a large code base - not trivial to fix.)
Ensure that all directories on $PQRHOME, even tracking through symlinks, are secure (details below, again), and the software refuses to start if anything is insecure.
Hard-code the path to the software install location. (This won't work PQR; it makes testing hell, if nothing else. For users, it means they can have but one version installed, and upgrades etc require parallel running. This does not work for PQR.)
Proposed criteria for secure paths:
For each directory, the owner must be trusted. (Rationale: the owner can change permissions at any time, so the owner must be trusted not to make changes at random that break the security of the software.)
For each directory, the group must either not have write privileges (so members of the group cannot modify the directory contents) or the group must be trusted. (Rationale: if the group members can modify the directory, then they can break the security of the software, so either they must be unable to change it, or they must be trusted not to changed it.)
For each directory, 'others' must have no write privilege on the directory.
By default, the users root, bin, sys, and pqrusr can be trusted (where bin and sys exist).
By default, the group with GID=0 (variously known as root, wheel or system), bin, sys, and pqrgrp can be trusted. Additionally, the group that owns the root directory (which is called admin on MacOS X) can be trusted.
The POSIX function realpath() provides a mapping service that will map /home/attacker/pqr to /opt/pqr; it does not do the security checking, but that need only be done on the resolved path.
So, with all that as background, is there any known software which goes through vaguely related gyrations to ensure its security? Is this being overly paranoid? (If so, why - and are you really sure?)
Edited:
Thanks for the various comments.
#S.Lott: The attack (outlined in the question) means that at least one setuid root program can be made to use a format string of the (unprivileged) user's choosing, and can at least crash the program and therefore most probably can acquire a root shell. It requires local shell access, fortunately; it is not a remote attack. It requires a non-negligible amount of knowledge to get there, but I consider it unwise to assume that the expertise is not 'out there'.
So, what I'm describing is a 'format string vulnerability' and the known attack path involves faking the program out so that although it thinks it is accessing secure message files, it actually goes and uses the message files (which contain format strings) that are under the control of the user, not under the control of the software.
Option 2 works, if you write a new value for $PQRHOME after resolving its real path and check its security. That way very little of your code needs changing thereafter.
As far as keeping the setuid privileges, it would help if you can do some sort of privilege separation, so that any operations involving input from the real user runs under the real uid. The privileged process and the real-uid process then talk using a socketpair or something like it.
Well, it sounds paranoid, but if it is or not depends on which system(s) your application is running on and which damage can an attacker do.
So, if your userbase is possibly hostile and if the damage is possibly very high, I'd go for the option 4, but modified as follows to remove its drawbacks.
Let me quote two relevant things:
1)
The programs currently check that $PQRHOME and key directories
under it are 'safe' (owned by pqrusr,
belong to pqrgrp, do not have public
write access).
2)
Thereafter, programs access files under $PQRHOME via the full
value of environment variable.
You don't need to actually hard-code the full path, you can hard-code just the relative path from the "program" you mentioned in 1) to the path mentioned in 2) where the files are.
Issue to control:
a) you must be sure that there isn't anything "attacker-accessible" (e.g. in term of symlinks) in between the executable's path and the files' path
b) you must be sure that the executable check its own path in a reliable way, but this should not be a problem in all the Unix'es I know (but I don't know all 'em and I don't know windows at all).
EDITED after the 3rd comment:
If your OS support /proc, the syslink /proc/${pid}/exe is the best way to solve b)
EDITED after sleeping on it:
Is the installation a "safe" process? If so, you might create (at installation time) a wrapper script. This script should be executable but not writable (and possibly neither readable). It would set the $PQRHOME env var to the "safe" value and then call your actual program (it might eventually do other useful things too). Since in UNIX the env vars of a running process cannot be changed by anything else but the running process, you are safe (of course the env vars can be changed by the parent before the process starts). I do not know if this approach works in Windows, though.

Resources