libsandbox / pysandbox documentation - sandbox

I want to execute code for an online-judge project.
I couldn't find documentation for the Python wrapper of libsandbox
I've found sample2.py and some test cases but without explanations.
What are the defaults when creating a sandbox?
Is it secure by default?
I want to execute untrusted code and
- Limit CPU
- Limit Memory
- Limit execution time
- Allow read/write access only to a specific folder and limit the size of this folder.
- Block network IO.
- Block executing other programs.
This code combines two examples I found:
cookbook = {
'args': args[1:], # targeted program
'stdin': sys.stdin, # input to targeted program
'stdout': sys.stdout, # output from targeted program
'stderr': sys.stderr, # error from targeted program
'jail': './foo',
'owner': 'nobody',
'quota': dict(wallclock = 30000,# 30 sec
cpu = 2000, # 2 sec
memory = 8388608, # 8 MB
disk = 1048576)} # 1 MB
# create a sandbox instance and execute till end
s = Sandbox(**cookbook)
s.run()
s.result == S_RESULT_OK
What does disk in quota limit? Does it limit the total bytes the script can write in this run or the size of the folder?
What does setting owner to nobody do?
Will the code in my example block executing arbitrary code, block network IO and block access to files outside of the jailed folder?
Thanks

What are the defaults when creating a sandbox? Is it secure by default?
A Sandbox instance is permissive by default. It grants unlimited quota unless you specify quota; it allows all system calls except ones that lead to multi-processing (i.e. fork(), vfork(), clone(), ...) and inter-process communication (i.e. waitid(), ptrace(), ...) unless you filter system call events with a custom policy.
The sample code (sample2.py) distributed with libsandbox is a minimal working example of restrictive, white-list sandboxing. Use that as the framework of your watchdog program.
What does disk in quota limit? Does it limit the total bytes the script can write in this run or the size of the folder?
disk quota limits the total bytes that the target program can write to all eligible file systems (i.e. ones that support quota limitation and are capable of generating SIGXFSZ signals).
If the program writes a regular file on ext3 or ext4 file system, that usually counts; but writing to standard output stream or /dev/null does not count against the quota. Nevertheless, you can implement folder-based quota within your custom policy.
What does setting owner to nobody do?
Execute the targeted program on behalf of user nobody. The owner argument wraps the OS-level service setuid(). After setuid() to nobody, the targeted program has all the permissions the OS granted to user nobody, and nothing beyond.
Please note that, you have to be a super user to be able to specify an owner other than yourself.
Will the code in my example block executing arbitrary code, block network IO and block access to files outside of the jailed folder?
Not exactly. All system calls made by the program are reported to the Sandbox, but you have to plug a policy module that explicitly blocks system calls relating to network IO. Or you can filter all system calls against a white list, as did by the sample code sample2.py.
Also note that, you have to be a super user to be able to specify a jail other than the root directory /.
DISCLAIMER: I am the author of libsandbox.

Related

Identify Resource Usage of a Process: CPU, Memory and I/O

I need to request some AWS resources and in order to do so, I need to identify what are my requirements for:
Number of CPU cores (maybe GPUs as well) (performs parallel processing)
Amount of Memory required
I/O & network read/write time (optional / good to know)
How can I profile my script so that I know if I am using the requested resources to the fullest?
How can I profile the whole system? Something along the following lines:
i. Request large number of compute resources (CPUs and RAM) on AWS
ii. Start the system profiler
iii. Run my program and wait for it to finish
iv. Stop the system profiler and identify the peak #CPUs and RAM used
Context: Unix / Linux
You could use time(1), but there are two variants of it.
the time builtin in most shells is usually not enough for your needs, so...
you need the /usr/bin/time program from the time package
Then you'll run /usr/bin/time -v yourscript
and inside your script you could use the times builtin (see this)
You also should consider using perf(1) and oprofile(1)
At last, you might eventually code something or use something querying the kernel thru /proc/ (see proc(5)). Utilities like xosview are using that.

Limiting the memory usage of a program in Linux

I'm new to Linux and Terminal (or whatever kind of command prompt it uses), and I want to control the amount of RAM a process can use. I already looked for hours to find an easy-t-use guide. I have a few requirements for limiting it:
Multiple instances of the program will be running, but I only want to limit some of the instances.
I do not want the process to crash once it exceeds the limit. I want it to use HDD page swap.
The program will run under WINE, and is a .exe.
So can somebody please help with the command to limit the RAM usage on a process in Linux?
The fact that you’re using Wine makes no difference in this particular context, which leaves requirements 1 and 2. Requirement 2 –
I do not want the process to crash once it exceeds the limit. I want it to use HDD page swap.
– is known as limiting the resident set size or rss of the process, and it’s actually rather nontrivial to do on Linux, as is demonstrated by a question asked in 2010. You’ll need to set up Linux control groups (cgroups). Fortunately, Justin L.’s answer gives a brief rundown on how to do so. Note that
instead of jlebar, you should use your own Unix user name, and
instead of your/program, you should use wine /path/to/Windows/program.exe.
Using cgroups will also satisfy your other requirements – you can start as many instances of the program as you wish, but only those which you start with cgexec -g memory:limited will be limited.

how does kernel handle new file creation

I wish to understand the way kernel works when a user/app tries to create a file in a directorty.
The background - We have a java applicaiton which consumes messages over JMS, processes it and then writes the XML to an outbound queue+a local directory. Yesterday we obeserved unsual delays in writing to the directory. On 'ls|wc -l' we found >300,000 files in there. Did a quick strace on the process and found it full of mutex calls (More than 3/4 calls in the strace were mutex).
So i thought that new file creation is taking time becasue the system has to every time check certain things (e.g name of files to make sure that the new file with a specific name can be created) amongst 300,000 files and then create a file.
I cleared the directory and the applicaiton resumed to normal service levels.
My questions
Was my analysis correct (It seems cuz the app started working fine after a clear down)?
More imporatant, how does the kernel work when you try to creat a new file in directory.
Can the abnormal number of mutex calls be attributed to the high number of files in the directory?
Many thanks
J
Please read about the Linux Filesystem, i-nodes and d-nodes.
http://en.wikipedia.org/wiki/Inode_pointer_structure
The file system is organized into fixed-sized blocks. If your directory is relatively small, it fits in the direct blocks and things are fast. If your directory is not too big, it fits in the direct blocks and some indirect blocks, and is still reasonably fast. If your directory becomes too big, it spills into double indirect blocks and becomes slow.
Actual sizes depend on file system and kernel configuration.
Rule of thumb is to keep the directory under 12 blocks, depending on your block size. Many systems use 8K blocks; a fast directory is under 98,304 bytes.
A file entry is something like 16*4 bytes in size (IIRC), so plan on no more than 1500 files per directory as a practical upper limit.
Directories with large numbers of entries are often slow - how slow depends on the underlying filesystem.
The common solution is to create a hierarchy of directories, so each dir only has a few hundred entries.
Mutex system calls are a result of the application (probably something in the JVM or the Java libraries) making mutex calls.
Synchronisation internal to the kernel you will not see via strace, as this only examines system calls themselves.
A directory with lots of files should not become inefficient if you are using a filesystem which uses directory indexes; most now do (ext3 does optionally but it's normally enabled nowadays).
Non-indexed directories (like those used on the bad old filesystems - ext2, vfat etc) get really bad with lots of files, and you'll see the "open" system call taking a lot longer.

Limiting the File System Usage Programmatically in Linux

I was assigned to write a system call for Linux kernel, which oddly determines (and reduces) users´ maximum transfer amount per minute (for file operations). This system call will be called lim_fs_usage and will take a parameter for maximum number of bytes all users can access in a minute. For short, I am going to determine bandwidth of all filesystem operations in Linux. The project also asks for choosing appropriate method for distribution of this restricted resource (file access) among the users but I think this
won´t be a big problem.
I did a long long search and scan but could not find a method for managing file system access programmatically. I thought of mapping (mmap())hard drive to memory and manage memory operations but this turned to be useless. I also tried to find an API for virtual file system in order to monitor and limit it but I could not find one. Any ideas, please... Any help is greatly appreciated. Thank you in advance...
I wonder if you could do this as an IO scheduler implementation.
The main difficulty of doing IO bandwidth limitation under Linux is, by the time it reaches anywhere near the device, the kernel has probably long since forgotten who caused it.
Likewise, you can get on some very tricky ground in determining who is responsible for a given piece of IO:
If a binary is demand-loaded, who owns the IO doing that?
A mapped section of memory (demand-loaded executable or otherwise) might be kicked out of memory because someone else used too much ram, thus causing the kernel to choose to evict those pages, which places an unfair burden on the quota of the other user to then page it back in
IO operations can be combined, and might come from different users
A write operation might cause an IO sooner or later depending on how the kernel schedules it; a later schedule may mean that fewer IOs need to be done in the long run, as another write gets done to the same block in the interim; writing to an already dirty block in cache does not make it any dirtier.
If you understand all these and more caveats, and still want to, I imagine doing it as an IO scheduler is the way to go.
IO schedulers are pluggable under Linux (2.6) and can be changed dynamically - the kernel waits for all IO on the device (IO scheduler is switchable per block device) to end and then switches to the new one.
Since it's urgent I'll give you an idea out of the top of my head without doing any research on the feasibility -- what about inserting a hook to monitor system calls that deal with file system access?
You might end up writing specialised kernel modules to handle the various filesystems (ext3, ext4, etc) but as a proof-of-concept you can start with one. Do not forget that root has reserved blocks in memory, process space and disk for his own operations.
Managing memory operations does not sound related to what you're trying to do (but perhaps I am mistaken here).
After a long period of thinking and searching, I decided to use the ¨hooking¨ method proposed. I am thinking of creating a new system call which initializes and manages a global variable like hdd_ bandwith _limit. This variable will be used in Read() and Write() system calls´ modified implementation (instead of ¨count¨ variable). Then I will decide distribution of this resource which is the real issue. Probably I will find out how many users are using the system for a certain moment and divide this resource equally. Will be a Round-Robin-like distribution. But still, I am open to suggestions on this distribution issue. Will it be a SJF or FCFS or Round-Robin? Synchronization is another issue. How can I know a user´s job is short or long? Or whether he is done with the operation or not?

Detecting and controlling unauthorized shared memory reads

I was wondering - are there any known techniques to control access to a shared memory object from anywhere but an authorized program?
For instance, lets say I create a shared memory segment for use in a program P, to be accessed by Q, and I make it Read-Write. I can access it using Q because I've given it (Q) the required permissions to do so (running as a particular user with groups, etc).
However, I'm guessing there are instances where someone could potentially access this shared memory from a program R - simply attaching to it and modifying it. To stop this, you could make the memory segment read only - but now program R could still read what was in the memory.
My question is in parts -
Is there a way to,
a) allow only Q to access the shared memory?
b) figure whether a read was done by someone apart from Q - and who did it? [Is this even possible?] For bonus points, could this be done cross-platform? [Probably not, but no harm trying :)]
Under what circumstances could a rogue program attach to the shared memory? I presume one way is if a user is able to exploit OS holes and become the user that started the program. Any others?
POSIX shared memory has the same permissions system as files - if you run ipcs you'll see the permissions of the shared memory segments on your system:
$ ipcs -m
IPC status from <running system> as of Tue Jul 14 23:21:25 BST 2009
T ID KEY MODE OWNER GROUP
Shared Memory:
m 65536 0x07021999 --rw-r--r-- root wheel
m 65537 0x60022006 --rw-r--r-- root wheel
In answer to question 1a), you can use the normal UNIX permissions system to only allow access from a certain user and/or group. This can be controlled using shmctl :
struct ipc_perm perms;
perms.uid = 100;
perms.gid = 200;
perms.mode = 0660; // Allow read/write only by
// uid '100' or members of group '200'
shmctl(shmid, IPC_SET, &perms);
For 1b), I don't think any auditing interfaces exist for shared memory access.
With regards to your second question, any process running as the shm owner/group, or running as root will be able to access your memory - this is no different to accessing any other resource. Root can always access anything on a *ix system; and so any exploit which escalated a user to root would allow access to any shared memory region.

Resources