I am working on hardening a sandbox for student code execution. I think I'm satisfied that students can't share data on the file system or with signals because I've found express rules dictating those and they execute as different unprivileged users. However, I am having a really hard time looking at documentation to determine, when shared memory (or IPC more generally - queues or semaphores) is created, who can see that. If you create shared memory, can anyone on the same machine open it, or is there a way to control that? Does the control lie in the program that creates the memory, or can the sysadmin limit it?
Any process in the same ipc namespace can see and (potentially) access ipc objects created by other processes in the same ipc namespace. Each ipc object has the same user/group/other-rwx permissions as file system objects objects -- see the svipc(7) manual page.
You can create a new ipc namespace by using the clone(2) system call with the CLONE_NEWIPC flag. You can use the unshare(1) program to do a clone+exec of another program with this or certain other CLONE flags.
Related
A prerequisite to my question is the assumption, that one uses a Kubernetes cluster with multiple pods accessing a file system storage, for example Azure File (ReadWriteMany). According to Microsoft this relies on SMB3 and should be safe for concurrent use.
Within one pod GO is used (with the GIN framework) as the programming language for a server that accesses the the file system. Since requests are handled in parallel using goroutines the file system access is concurrent.
Does SMB3 also ensure correctness for concurrent ReadWrites within one pod, or do I need to manually manage file system operations with something like a Mutex or a Worker Pool?
More specifically: I want to use git2go which is a GO wrapper around libgit2 written in C. According to libgit2s threading.md it is not thread safe:
Unless otherwise specified, libgit2 objects cannot be safely accessed by multiple threads simultaneously.
Since git2go does not specify otherwise I assume, that different references to the same repository on the file system cannot be used concurrently from different goroutines.
To summarise my question:
Do network communication protocols (like SMB3) also "make" file system operations thread safe locally?
If not: how would I ensure thread safe file system access in GO? lockedfile seems to be a good option, but it is also internal.
I want to build a small web application in Rust which should be able to read and write files on a users behalf. The user should authenticate with their UNIX credentials and then be able to read / write only the files they have access to.
My first idea, which would also seem the most secure to me, would be to switch the user-context of an application thread and do all the read/write-stuff there. Is this possible?
If this is possible, what would the performance look like? I would assume spawning an operating system thread every time a request comes in could have a very high overhead. Is there a better way to do this?
I really wouldn't like to run my entire application as root and check the permissions manually.
On GNU/Linux, it is not possible to switch UID and GID just for a single thread of a process. The Linux kernel maintains per-thread credentials, but POSIX requires a single set of credentials per process: POSIX setuid must change the UID of all threads or none. glibc goes to great lengths to emulate the POSIX behavior, although that is quite difficult.
You would have to create a completely new process for each request, not just a new thread. Process creation is quite cheap on Linux, but it could still be a performance problem. You could keep a pool of processes around to avoid the overhead of repeated process creation. On the other hand, many years ago, lots of web sites (including some fairly large ones) used CGI to generate web pages, and you can get relatively far with a simple design.
I think #Florian got this backwards in his original answer. man 2 setuid says
C library/kernel differences
At the kernel level, user IDs and group IDs are a per-thread attribute. However, POSIX requires that all threads in a process
share the same credentials. The NPTL threading implementation handles
the POSIX requirements by providing wrapper functions for the various
system calls that change process
UIDs and GIDs. These wrapper functions (including the one for setuid()) employ a signal-based technique to ensure that when one
thread changes credentials, all of the other threads in the process
also change their credentials. For details, see nptl(7).
Since libc does the signal dance to do it for the whole process you will have to do direct system calls to bypass that.
Note that this is linux-specific. Most other unix variants do seem to follow posix at the kernel level instead emulating it in libc.
Is there an operating system-specific way in Linux/Darwin/Windows, to restrict access to certain virtual memory pages to only one thread, so that when another thread tries to access it, the OS would intercept and report an error?
I'm trying to emulate the behavior of fork with multiple processes, where each process has its own memory except for some shared memory, mainly to avoid all programming errors where one worker would access memory belonging to another worker.
As a general proposition, this is not possible. The whole idea of threads is to have multiple streams of execution that share the same address. If you're a kernel mode kommando, you might be able to some up with some modification of the page tables that a thread uses to make pages inaccessible from usermode then unlocks them.
I have two processes that communicate through shared memory. One is privileged and trusted, the other is an LXC process and untrusted.
The trusted process creates a file in a directory that the LXC process can access. It sets it to a fixed size with ftrucnate.
Now it shares that file with the untrusted process by both of them mapping it read+write.
I want the untrusted process to be able to read and write to the mapping, which is safe, because the trusted process makes no assumptions about what has been written and carefully validates it.
However, with write access the untrusted process can ftruncate the file to zero (it can't increase it's size due to mount restrictions) and this causes a SIGBUS in the privileged process (I confirmed this.)
Since there are many untrusted processes which communicate with the trusted one, this is basically a denial of service attack on the entire system, and Linux permits it. Is there any way to prevent this?
I could deny access to ftruncate, but there may be other system calls to do the same thing. Surely there is a way to allow a process to write to a file but not to resize it or rename it or make any other meta data changes?
The best I can think of is fallback to the archaic System V shared memory, because that cannot be resized at all on Linux (not even by the priveledged process.)
Since Linux version 3.17 you can use file seals for that purpose. They are supported only on tmpfs, so will work with POSIX shared memory and with shared files created with memfd_create(). Before handing the file descriptor to untrusted process call fcntl(fd, F_ADD_SEALS, F_SEAL_SHRINK) and your trusted process is safe from SIGBUS.
For details see manual pages for memfd_create() and fcntl().
Is there a way to change UID/GID only of one thread in a multithreaded process?
The reason for this is writing a file-serving application - the ACL's and quota are not enforced unless the uid/gid of the caller is set to the correct user, new files/directories are not created with correct uid/gid etc.
The network applications can usually fork() themselves at the beginning and process each user request in separate process. If there is a need for shared data, it must go through some kind of shared memory. However, e.g. the FUSE (linux user filesystem) by default uses multithreading and in conjuction with python bindings it wouldn't be practical to try to use a forking model.
The 'consistent' UID for a whole process seems to be according to the POSIX standard, however old Linuxes didn't follow the POSIX and allowed different uids for different threads. The new kernels seem to follow POSIX, is there some way to allow the old 'broken' behaviour?
The Linux-specific setfsuid() / setfsgid() are per-thread rather than per-process. They're designed specifically for this use case (file server).
Note that access() will still check access using the real uid and gid - that is by design (it is intended to answer the question "should the user who ran this binary have the given access to this file"). For the setfsuid() / setfsgid() case you should just try the requested operation and detect failure due to lack of permission at that point.
To change the uid only for one thread you need to use the syscall directly: syscall(SYS_setresuid, ...); The libc function setresuid() will synchronize it for all threads (using a singal which it sends to all threads)!