Prevent truncating of a shared mapped file? - linux

I have two processes that communicate through shared memory. One is privileged and trusted, the other is an LXC process and untrusted.
The trusted process creates a file in a directory that the LXC process can access. It sets it to a fixed size with ftrucnate.
Now it shares that file with the untrusted process by both of them mapping it read+write.
I want the untrusted process to be able to read and write to the mapping, which is safe, because the trusted process makes no assumptions about what has been written and carefully validates it.
However, with write access the untrusted process can ftruncate the file to zero (it can't increase it's size due to mount restrictions) and this causes a SIGBUS in the privileged process (I confirmed this.)
Since there are many untrusted processes which communicate with the trusted one, this is basically a denial of service attack on the entire system, and Linux permits it. Is there any way to prevent this?
I could deny access to ftruncate, but there may be other system calls to do the same thing. Surely there is a way to allow a process to write to a file but not to resize it or rename it or make any other meta data changes?
The best I can think of is fallback to the archaic System V shared memory, because that cannot be resized at all on Linux (not even by the priveledged process.)

Since Linux version 3.17 you can use file seals for that purpose. They are supported only on tmpfs, so will work with POSIX shared memory and with shared files created with memfd_create(). Before handing the file descriptor to untrusted process call fcntl(fd, F_ADD_SEALS, F_SEAL_SHRINK) and your trusted process is safe from SIGBUS.
For details see manual pages for memfd_create() and fcntl().

Related

What is the usage of open_by_handle_at(2)?

I learn this system call from docker breakout tech with CAP_DAC_READ_SEARCH.
I am wondering what is this system call originally designed for? Or is there any other typical and common usage of it? Because after a while of searching, I found that the most significant usage of open_by_handle_at(2) is to break out from containers...
From man open_by_handle:
These system calls are designed for use by user-space file
servers. For example, a user-space NFS server might generate a
file handle and pass it to an NFS client. Later, when the client
wants to open the file, it could pass the handle back to the
server. This sort of functionality allows a user-space file
server to operate in a stateless fashion with respect to the
files it serves.

Who can share shared memory in Linux?

I am working on hardening a sandbox for student code execution. I think I'm satisfied that students can't share data on the file system or with signals because I've found express rules dictating those and they execute as different unprivileged users. However, I am having a really hard time looking at documentation to determine, when shared memory (or IPC more generally - queues or semaphores) is created, who can see that. If you create shared memory, can anyone on the same machine open it, or is there a way to control that? Does the control lie in the program that creates the memory, or can the sysadmin limit it?
Any process in the same ipc namespace can see and (potentially) access ipc objects created by other processes in the same ipc namespace. Each ipc object has the same user/group/other-rwx permissions as file system objects objects -- see the svipc(7) manual page.
You can create a new ipc namespace by using the clone(2) system call with the CLONE_NEWIPC flag. You can use the unshare(1) program to do a clone+exec of another program with this or certain other CLONE flags.

mmap file shared via nfs?

Scenario A:
To share a read/write block of memory between two processes running on the same host, Joe mmaps the same local file from both processes.
Scenario B:
To share a read/write block of memory between two processes running on two different hosts, Joe shares a file via nfs between the hosts, and then mmaps the shared file from both processes.
Has anyone tried Scenario B? What are the extra problems that arise in Scenario B that do not apply to Scenario A?.
Mmap will not share data without some additional actions.
If you change data in mmaped part of file, changes will be stored only in memory. They will not be flushed to the filesystem (local or remote) until msync or munmap or close or even decision of OS kernel and its FS.
When using NFS, locking and storing of data will be slower than if using local FS. Timeouts of flushing and time of file operations will vary too.
On the sister site people says that NFS may have poor caching policy, so there will be much more I/O requests to the NFS server comparing I/O request count to local FS.
You will need byte-range-lock for correct behavior. They are available in NFS >= v4.0.
I'd say scenario B has all kinds of problems (assuming it works as suggested in the comments). The most obvious is the standards concurrency issues - 2 processes sharing 1 resource with no form of locking etc. That could lead to problems... Not sure whether NFS has its own peculiar quirks in this regard or not.
Assuming you can get around the concurrency issues somehow, you are now reliant on maintaining a stable (and speedy) network connection. Obviously if the network drops out, you might miss some changes. Whether this matters depends on your architecture.
My thought is it sounds like an easy way to share a block of memory on different machines, but I can't say I've heard of it being done which makes me think it isn't so good. When I think sharing data between procs, I think DBs, messaging or a dedicated server. In this case if you made one proc the master (to handle concurrency and owning the concept -i.e. whatever this guy says is the best copy of the data) it might work...

Change UID/GID only of one thread in Linux

Is there a way to change UID/GID only of one thread in a multithreaded process?
The reason for this is writing a file-serving application - the ACL's and quota are not enforced unless the uid/gid of the caller is set to the correct user, new files/directories are not created with correct uid/gid etc.
The network applications can usually fork() themselves at the beginning and process each user request in separate process. If there is a need for shared data, it must go through some kind of shared memory. However, e.g. the FUSE (linux user filesystem) by default uses multithreading and in conjuction with python bindings it wouldn't be practical to try to use a forking model.
The 'consistent' UID for a whole process seems to be according to the POSIX standard, however old Linuxes didn't follow the POSIX and allowed different uids for different threads. The new kernels seem to follow POSIX, is there some way to allow the old 'broken' behaviour?
The Linux-specific setfsuid() / setfsgid() are per-thread rather than per-process. They're designed specifically for this use case (file server).
Note that access() will still check access using the real uid and gid - that is by design (it is intended to answer the question "should the user who ran this binary have the given access to this file"). For the setfsuid() / setfsgid() case you should just try the requested operation and detect failure due to lack of permission at that point.
To change the uid only for one thread you need to use the syscall directly: syscall(SYS_setresuid, ...); The libc function setresuid() will synchronize it for all threads (using a singal which it sends to all threads)!

How is it possible to access memory of other processes?

I thought that one process cannot read the memory of other processes. But I'm shocked to see an application named "WinHex" which has "RAM Editor" and it is able to access the entire memory. Of all the processes.
How is that possible? And it is even able to modify the memory of other processes. Doesn't this become malicious?
In all likelyhood, the tool uses ReadProcessMemory or some variant, which requires PROCESS_VM_READ access.
With respect to your "malicious" comment, remember that you (or the process invoking this API, which likely needs Administrator-level permissions) already has total control over the machine. The security game is already lost at this point.
Well, that's one of the things a process with the right privileges, granted by the operating system, can do. Processes cannot access other processes' memory in principle. In practice the underlying operating system usually offers this mechanism to privileged processes.
Accessing other process' memory is a piece of cake.
You can even use Windows Driver Kit to access and modify everything.
Check out for example rootkits to see how fragile is the OS when you don't restrict programs' privileges.
If you're running as Administrator, you can obtain privileges to read all of memory; it seems that WinHex is doing this on your behalf.
Have you tried this on a more restricted account?
I think it uses some DLL injection technique.
See http://en.wikipedia.org/wiki/DLL_injection for more information

Resources