I learn this system call from docker breakout tech with CAP_DAC_READ_SEARCH.
I am wondering what is this system call originally designed for? Or is there any other typical and common usage of it? Because after a while of searching, I found that the most significant usage of open_by_handle_at(2) is to break out from containers...
From man open_by_handle:
These system calls are designed for use by user-space file
servers. For example, a user-space NFS server might generate a
file handle and pass it to an NFS client. Later, when the client
wants to open the file, it could pass the handle back to the
server. This sort of functionality allows a user-space file
server to operate in a stateless fashion with respect to the
files it serves.
Related
I have two processes that communicate through shared memory. One is privileged and trusted, the other is an LXC process and untrusted.
The trusted process creates a file in a directory that the LXC process can access. It sets it to a fixed size with ftrucnate.
Now it shares that file with the untrusted process by both of them mapping it read+write.
I want the untrusted process to be able to read and write to the mapping, which is safe, because the trusted process makes no assumptions about what has been written and carefully validates it.
However, with write access the untrusted process can ftruncate the file to zero (it can't increase it's size due to mount restrictions) and this causes a SIGBUS in the privileged process (I confirmed this.)
Since there are many untrusted processes which communicate with the trusted one, this is basically a denial of service attack on the entire system, and Linux permits it. Is there any way to prevent this?
I could deny access to ftruncate, but there may be other system calls to do the same thing. Surely there is a way to allow a process to write to a file but not to resize it or rename it or make any other meta data changes?
The best I can think of is fallback to the archaic System V shared memory, because that cannot be resized at all on Linux (not even by the priveledged process.)
Since Linux version 3.17 you can use file seals for that purpose. They are supported only on tmpfs, so will work with POSIX shared memory and with shared files created with memfd_create(). Before handing the file descriptor to untrusted process call fcntl(fd, F_ADD_SEALS, F_SEAL_SHRINK) and your trusted process is safe from SIGBUS.
For details see manual pages for memfd_create() and fcntl().
Scenario A:
To share a read/write block of memory between two processes running on the same host, Joe mmaps the same local file from both processes.
Scenario B:
To share a read/write block of memory between two processes running on two different hosts, Joe shares a file via nfs between the hosts, and then mmaps the shared file from both processes.
Has anyone tried Scenario B? What are the extra problems that arise in Scenario B that do not apply to Scenario A?.
Mmap will not share data without some additional actions.
If you change data in mmaped part of file, changes will be stored only in memory. They will not be flushed to the filesystem (local or remote) until msync or munmap or close or even decision of OS kernel and its FS.
When using NFS, locking and storing of data will be slower than if using local FS. Timeouts of flushing and time of file operations will vary too.
On the sister site people says that NFS may have poor caching policy, so there will be much more I/O requests to the NFS server comparing I/O request count to local FS.
You will need byte-range-lock for correct behavior. They are available in NFS >= v4.0.
I'd say scenario B has all kinds of problems (assuming it works as suggested in the comments). The most obvious is the standards concurrency issues - 2 processes sharing 1 resource with no form of locking etc. That could lead to problems... Not sure whether NFS has its own peculiar quirks in this regard or not.
Assuming you can get around the concurrency issues somehow, you are now reliant on maintaining a stable (and speedy) network connection. Obviously if the network drops out, you might miss some changes. Whether this matters depends on your architecture.
My thought is it sounds like an easy way to share a block of memory on different machines, but I can't say I've heard of it being done which makes me think it isn't so good. When I think sharing data between procs, I think DBs, messaging or a dedicated server. In this case if you made one proc the master (to handle concurrency and owning the concept -i.e. whatever this guy says is the best copy of the data) it might work...
Hoi.
I am working on an experiment allowing users to use 1% of my CPU. That's like your own Webserver; but a big dynamic remote execution framework (dont ask about that), and I dont want users to use API functions like create files, no sockets, no threads, no console output, nothing.
Update1: People will be sending me binaries, so interrupt 0x80 is possible. Therefore... Kernel?
I need to limit a process so it cannot do anything but use a single pipe. Through that pipe the process will use my own wrapped and controlled API.
Is that even possible? I thought like a Linux kernel module.
The issues with limiting RAM and CPU are not primary here, for that there's something on google.
Thanks in advance!
The ptrace facility will allow your program to observe and control the operation of another process. Using the PTRACE_SYSCALL flag, you can stop the child process before every syscall, and make a decision about whether you want to allow that system call to proceed.
You might want to look at what Google is doing with their Native Client technology and the seccomp sandbox. The Native Client (NaCl) stuff is intended to let x86 binaries supplied by a web site run inside a user's local browser. The problem of malicious binaries is similar to what you face, so most of the technology/research probably applies directly.
Take the following code snippit:
f = open("/mnt/remoteserver/bar/foo.bin", O_RDONNLY);
while (true)
{
byteseread = read(f, buffer, 1000);
if (bytesread > 0)
ProcessBytes(buffer, bytesread);
else
break;
}
If the example above, let's say the remote file, foo.bin is 1MB and has never been accessed by the client before. So, that's approximately 1000 calls to "read" to get the entire file.
Further, let's say the server with the directory mounted on the client is over the internet and not local. Fast bandwidth to the client, but with long latency.
Does every "read" call invoke a round trip back to the server to ask for more data? Or does the client/server protocol recognize that subsequent reads on a remote file are often sequential, and as such, subsequent blocks are pushed down before the application has actually made a read() call for it. Hence, subsequent read calls return faster because the data was pre-fetched and cached.
Do modern network file system protocols (NFS, SMB/Samba, any others?) make any optimizations like this. Are there network file system protocols tuned for the internet that have optimizations like this?
I'm investigating a personal project that may involve implementation of a network file system over the internet. It struck me that performance may be faster if the number of round trips could be reduced for file i/o.
This is going to be very protocol implementation dependent. In general, I don't think most client implementations prefetch, but most savvy storage admins use large blocksizes (32+kb see the rsize/wsize mount options), which effectively results in the same thing. Network file systems are typically going to be cached via the systems buffer cache as well, so you'll definitely not be translating read() calls directly to network IO.
My advice would be to be to write your program naively (or a simple test case) and get comfortable reading the network stats via nfsstat, etc, and then optimize from there. There's far too many variables to get the answer any other way.
I'm no expert, but from what I can tell NFS4 has more WAN optimizations than the older protocols (nfs2,3,cifs) so I'd definitely factor it into your mix. That said, most remote filesystem protocols aren't really designed for high latency access which is why we end up with systems like S3, which are.
I am looking for a good way to manage the access to an external FTP server from various programs on a single server.
Currently I am working with a lock file, so that only one process can use the ftp server at a time. What would be a good way to allow 2-3 parallel processes to access the ftp server simultaneously. Unfortunately the provider does not allow more sessions and locks my account for a day if too many processes access their server.
Used platforms are Solaris and Linux - all ftp access is encapsulated in a single library thus there is only 1 function which I need to change. Would be nice if there is something on CPAN.
I'd look into perlipc(1) for SystemV semaphores or modules like POSIX::RT::Semaphore for posix semaphores. I'd create a semaphore with a resource count of 2-3, and then in the different process try to get the semaphore.
Instead of making a bunch of programs wait in line, could you create one local program that handled all the remote communication while the local programs talked to it? You effectively create a proxy and push that complexity away from your programs so you don't have to deal with it in every program.
I don't know the other constraints on your problem, but this has worked for me on similar issues.