Mounting proc in non-privileged namespace sandbox - linux

I'm trying to make a sandboxed environment using Linux namespaces. I've found a neat example at https://github.com/swetland/mkbox that roughly does what I want, but I'd like a credible /proc to appear inside the sandbox. How can I do that?
I tried bind mounting the proc FS on "proc", but that fails with EINVAL. When I try to mount "proc" normally, it yields EPERM.
ideas?

A local guru figured this out for me: the proc must use the (undocumented?) MS_REC flag, like so:
ok(mount, "/proc", "proc", NULL, MS_REC|MS_BIND, NULL);
the bind mount only does something useful if CLONE_PIDNS is not set, obviously.

I didn't look closely enough at your commit to know for sure if this is your issue, but EPERM will happen if you have CLONE_NEWUSER | CLONE_NEWNS but not CLONE_NEWPID. This is because in order to mount proc, you need CAP_SYS_ADMIN in the user namespace corresponding to the current PID namespace, not the current user namespace.
Linux 4.4, fs/proc/root.c, lines 112–117:
ns = task_active_pid_ns(current);
options = data;
/* Does the mounter have privilege over the pid namespace? */
if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
return ERR_PTR(-EPERM);

Related

echo into '/proc/filename' gives "Operation Not Permitted" error

Platform: Ubuntu 5.15.0-43-generic.
I have written a loadable kernel module to create a file under /proc called testproc. The kernel module loads perfectly and created the /proc/testproc. The permissions on /proc/testproc are 0666 and owned by root. I am logged in as root for all operations.
I have implemented the read and write handler in my kernel module and they get called too.
When I run the command
echo "Hello" > /proc/testproc
the error seen is
bash: echo: write error: Operation not permitted
I am using the call
proc_create("testproc", 0666, NULL, &procfsFuncs)
to create the entry under /proc
Any pointers much appreciated.
I figured out my (trivial) mistake. I was expecting a non-zero result from copy_from_user(), when in reality copy_from_user returns 0 on success.

cannot create /dev/stdout: No such device or address

I'm want to run a shell command via node and capture the result of stdout. My script works fine on OSX, but not on Ubuntu.
I've simplified the problem and script to the following node script:
var execSync = require('child_process').execSync,
result = execSync('echo "hello world" >> /dev/stdout');
// Do something with result
Results in:
/bin/sh: 1: cannot create /dev/stdout: No such device or address
I have tried replacing /dev/stdout with /dev/fd/1
I have tried changing the shell to bash... execSync('echo ...', {shell : '/bin/bash'})
Like I said, the problem above is simplified. The real script accepts as a parameter the name of a file where results should be written, so I need to resolve this by providing access to the stdout stream as a file descriptor, i.e. /dev/stdout.
How can I execute a command via node, while giving the command access to its own stdout stream?
On /dev/stdout
I don't have access to an OSX box, but from this issue on phantomjs, it seems that while on both OSX/BSD and Linux /dev/stdout is a symlink, nonetheless it seems to work differently between them. One of the commenters said it's standard on OSX to use /dev/stdout but not for Linux. In another random place I read statements that imply /dev/stdout is pretty much an OSX thing. There might be a clue in this answer as to why it doesn't work on Linux (seems to implicitly close the file descriptor when used this way).
Further related questions:
https://unix.stackexchange.com/questions/36403/portability-of-dev-stdout
bash redirect to /dev/stdout: Not a directory
The solution
I tried your code on Arch and it indeed gives me the same error, as do the variations mentioned - so this is not related to Ubuntu.
I found a blog post that describes how you can pass a file descriptor to execSync. Putting that together with what I got from here and here, I wrote this modified version of your code:
var fs = require('fs');
var path = require('path');
var fdout = fs.openSync(path.join(process.cwd(), 'stdout.txt'), 'a');
var fderr = fs.openSync(path.join(process.cwd(), 'stderr.txt'), 'a');
var execSync = require('child_process').execSync,
result = execSync('echo "hello world"', {stdio: [0,fdout,fderr] });
Unless I misunderstood your question, you want to be able to change where the output of the command in execSync goes. With this you can, using a file descriptor. You can still pass 1 and 2 if you want the called program to output to stdout and stderr as inherited by its parent, which you've already mentioned in the comments.
For future reference, this worked on Arch with kernel version 4.10.9-1-ARCH, on bash 4.4.12 and node v7.7.3.

How to get cwd for relative paths?

How can I get current working directory in strace output, for system calls that are being called with relative paths? I'm trying to debug complex application that spawns multiple processes and fails to open particular file.
stat("some_file", 0x7fff6b313df0) = -1 ENOENT (No such file or directory)
Since some_file exists I believe that its located in the wrong directory. I'd tried to trace chdir calls too, but since output is interleaved its hard to deduce working directory that way. Is there a better way?
You can use the -y option and it will print the full path. Another useful flag in this situation is -P which only traces syscalls relating to a specific path, e.g.
strace -y -P "some_file"
Unfortunately -y will only print the path of file descriptors, and since your call doesn't load any it doesn't have one. A possible workaround is to interrupt the process when that syscall is run in a debugger, then you can get its working directory by inspecting /proc/<PID>/cwd. Something like this (totally untested!)
gdb --args strace -P "some_file" -e inject=open:signal=SIGSEGV
Or you may be able to use a conditional breakpoint. Something like this should work, but I had difficulty with getting GDB to follow child processes after a fork. If you only have one process it should be fine I think.
gdb your_program
break open if $_streq((char*)$rdi, "some_file")
run
print getpid()
It is quite easy, use the function char *realpath(const char *path, char *resolved_path) for the current directory.
This is my example:
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
char *abs;
abs = realpath(".", NULL);
printf("%s\n", abs);
return 0;
}
output
root#ubuntu1504:~/patches_power_spec# pwd
/root/patches_power_spec
root#ubuntu1504:~/patches_power_spec# ./a.out
/root/patches_power_spec

Relinking an anonymous (unlinked but open) file

In Unix, it's possible to create a handle to an anonymous file by, e.g., creating and opening it with creat() and then removing the directory link with unlink() - leaving you with a file with an inode and storage but no possible way to re-open it. Such files are often used as temp files (and typically this is what tmpfile() returns to you).
My question: is there any way to re-attach a file like this back into the directory structure? If you could do this it means that you could e.g. implement file writes so that the file appears atomically and fully formed. This appeals to my compulsive neatness. ;)
When poking through the relevant system call functions I expected to find a version of link() called flink() (compare with chmod()/fchmod()) but, at least on Linux this doesn't exist.
Bonus points for telling me how to create the anonymous file without briefly exposing a filename in the disk's directory structure.
A patch for a proposed Linux flink() system call was submitted several years ago, but when Linus stated "there is no way in HELL we can do this securely without major other incursions", that pretty much ended the debate on whether to add this.
Update: As of Linux 3.11, it is now possible to create a file with no directory entry using open() with the new O_TMPFILE flag, and link it into the filesystem once it is fully formed using linkat() on /proc/self/fd/fd with the AT_SYMLINK_FOLLOW flag.
The following example is provided on the open() manual page:
char path[PATH_MAX];
fd = open("/path/to/dir", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);
/* File I/O on 'fd'... */
snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file", AT_SYMLINK_FOLLOW);
Note that linkat() will not allow open files to be re-attached after the last link is removed with unlink().
My question: is there any way to re-attach a file like this back into the directory structure? If you could do this it means that you could e.g. implement file writes so that the file appears atomically and fully formed. This appeals to the my compulsive neatness. ;)
If this is your only goal, you can achieve this in a much simpler and more widely used manner. If you are outputting to a.dat:
Open a.dat.part for write.
Write your data.
Rename a.dat.part to a.dat.
I can understand wanting to be neat, but unlinking a file and relinking it just to be "neat" is kind of silly.
This question on serverfault seems to indicate that this kind of re-linking is unsafe and not supported.
Thanks to #mark4o posting about linkat(2), see his answer for details.
I wanted to give it a try to see what actually happened when trying to actually link an anonymous file back into the filesystem it is stored on. (often /tmp, e.g. for video data that firefox is playing).
As of Linux 3.16, there still appears to be no way to undelete a deleted file that's still held open. Neither AT_SYMLINK_FOLLOW nor AT_EMPTY_PATH for linkat(2) do the trick for deleted files that used to have a name, even as root.
The only alternative is tail -c +1 -f /proc/19044/fd/1 > data.recov, which makes a separate copy, and you have to kill it manually when it's done.
Here's the perl wrapper I cooked up for testing. Use strace -eopen,linkat linkat.pl - </proc/.../fd/123 newname to verify that your system still can't undelete open files. (Same applies even with sudo). Obviously you should read code you find on the Internet before running it, or use a sandboxed account.
#!/usr/bin/perl -w
# 2015 Peter Cordes <peter#cordes.ca>
# public domain. If it breaks, you get to keep both pieces. Share and enjoy
# Linux-only linkat(2) wrapper (opens "." to get a directory FD for relative paths)
if ($#ARGV != 1) {
print "wrong number of args. Usage:\n";
print "linkat old new \t# will use AT_SYMLINK_FOLLOW\n";
print "linkat - <old new\t# to use the AT_EMPTY_PATH flag (requires root, and still doesn't re-link arbitrary files)\n";
exit(1);
}
# use POSIX qw(linkat AT_EMPTY_PATH AT_SYMLINK_FOLLOW); #nope, not even POSIX linkat is there
require 'syscall.ph';
use Errno;
# /usr/include/linux/fcntl.h
# #define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
# #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
# #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
unless (defined &AT_SYMLINK_NOFOLLOW) { sub AT_SYMLINK_NOFOLLOW() { 0x0100 } }
unless (defined &AT_SYMLINK_FOLLOW ) { sub AT_SYMLINK_FOLLOW () { 0x0400 } }
unless (defined &AT_EMPTY_PATH ) { sub AT_EMPTY_PATH () { 0x1000 } }
sub my_linkat ($$$$$) {
# tmp copies: perl doesn't know that the string args won't be modified.
my ($oldp, $newp, $flags) = ($_[1], $_[3], $_[4]);
return !syscall(&SYS_linkat, fileno($_[0]), $oldp, fileno($_[2]), $newp, $flags);
}
sub linkat_dotpaths ($$$) {
open(DOTFD, ".") or die "open . $!";
my $ret = my_linkat(DOTFD, $_[0], DOTFD, $_[1], $_[2]);
close DOTFD;
return $ret;
}
sub link_stdin ($) {
my ($newp, ) = #_;
open(DOTFD, ".") or die "open . $!";
my $ret = my_linkat(0, "", DOTFD, $newp, &AT_EMPTY_PATH);
close DOTFD;
return $ret;
}
sub linkat_follow_dotpaths ($$) {
return linkat_dotpaths($_[0], $_[1], &AT_SYMLINK_FOLLOW);
}
## main
my $oldp = $ARGV[0];
my $newp = $ARGV[1];
# link($oldp, $newp) or die "$!";
# my_linkat(fileno(DIRFD), $oldp, fileno(DIRFD), $newp, AT_SYMLINK_FOLLOW) or die "$!";
if ($oldp eq '-') {
print "linking stdin to '$newp'. You will get ENOENT without root (or CAP_DAC_READ_SEARCH). Even then doesn't work when links=0\n";
$ret = link_stdin( $newp );
} else {
$ret = linkat_follow_dotpaths($oldp, $newp);
}
# either way, you still can't re-link deleted files (tested Linux 3.16 and 4.2).
# print STDERR
die "error: linkat: $!.\n" . ($!{ENOENT} ? "ENOENT is the error you get when trying to re-link a deleted file\n" : '') unless $ret;
# if you want to see exactly what happened, run
# strace -eopen,linkat linkat.pl
Clearly, this is possible -- fsck does it, for example. However, fsck does it with major localized file system mojo and will clearly not be portable, nor executable as an unprivileged user. It's similar to the debugfs comment above.
Writing that flink(2) call would be an interesting exercise. As ijw points out, it would offer some advantages over current practice of temporary file renaming (rename, note, is guaranteed atomic).
Kind of late to the game but I just found http://computer-forensics.sans.org/blog/2009/01/27/recovering-open-but-unlinked-file-data which may answer the question. I haven't tested it, though, so YMMV. It looks sound.

linux: getting umask of an already running process?

How can I check the umask of a program which is currently running?
[update: another process, not the current process.]
You can attach gdb to a running process and then call umask in the debugger:
(gdb) attach <your pid>
...
(gdb) call umask(0)
[Switching to Thread -1217489200 (LWP 11037)]
$1 = 18 # this is the umask
(gdb) call umask(18) # reset umask
$2 = 0
(gdb)
(note: 18 corresponds to a umask of O22 in this example)
This suggests that there may be a really ugly way to get the umask using ptrace.
Beginning with Linux kernel 4.7, the umask is available in /proc/<pid>/status.
From the GNU C Library manual:
Here is an example showing how to read the mask with umask
without changing it permanently:
mode_t
read_umask (void)
{
mode_t mask = umask (0);
umask (mask);
return mask;
}
However, it is better to use getumask if you just want to read
the mask value, because it is reentrant (at least if you use the
GNU operating system).
getumask is glibc-specific, though. So if you value portability, then the non-reentrant solution is the only one there is.
Edit: I've just grepped for ->umask all through the Linux source code. There is nowhere that will get you the umask of a different process. Also, there is no getumask; apparently that's a Hurd-only thing.
If you're the current process, you can write a file to /tmp and check its setting. A better solution is to call umask(3) passing zero - the function returns the setting prior to the call - and then reset it back by passing that value back into umask.
The umask for another process doesn't seem to be exposed.
A colleague just showed me this command line pattern for this. I always have emacs running, so that's in the example below. The perl is my contribution:
sudo gdb --pid=$(pgrep emacs) --batch -ex 'call/o umask(0)' -ex 'call umask($1)' 2> /dev/null | perl -ne 'print("$1\n")if(/^\$1 = (\d+)$/)'

Resources