I want to make a web service that runs other people's code locally. Naturally, I want to limit their code's access to a certain "sandbox" directory, so that they won't be able to connect to other parts of my server (DB, main webserver, etc.)
What's the best way to do this?
Run VMware/Virtualbox:
+ I guess it's as secure as it gets. Even if someone manage to "hack", they only hack the guest machine
+ Can limit the CPU & memory the processes use
+ Easy to set up - just create the VM
- Harder to "connect" the sandbox directory from the host to the guest
- Wasting extra memory and CPU for managing the VM
Run underprivileged user:
+ Doesn't waste extra resources
+ Sandbox directory is just a plain directory
? Can't limit CPU and memory?
? I don't know if it's secure enough
Any other way?
Server running Fedora Core 8, the "other" codes written in Java & C++
To limit CPU and memory, you want to set limits for groups of processes (POSIX resource limits only apply to individual processes). You can do this using cgroups.
For example, to limit memory start by mounting the memory cgroups filesystem:
# mount cgroup -t cgroup -o memory /cgroups/memory
Then, create a new sub-directory for each group, e.g.
# mkdir /cgroups/memory/my-users
Put the processes you want constrained (process with PID "1234" here) into this group:
# cd /cgroups/memory/my-users
# echo 1234 >> tasks
Set the total memory limit for the group:
# echo 1000000 > memory.limit_in_bytes
If processes in the group fork child processes, they will also be in the group.
The above group sets the resident memory limit (i.e. constrained processes will start to swap rather than using more memory). Other cgroups let you constrain other things, such as CPU time.
You could either put your server process into the group (so that the whole system with all its users fall under the limits) or get the server to put each new session into a new group.
chroot, jail, container, VServer/OpenVZ/etc., are generally more secure than running as an unprivileged user, but lighter-weight than full OS virtualization.
Also, for Java, you might trust the JVM's built-in sandboxing, and for compiling C++, NaCl claims to be able to sandbox x86 code.
But as Checkers' answer states, it's been proven possible to cause malicious damage from almost any "sandbox" in the past, and I would expect more holes to be continually found (and hopefully fixed) in the future. Do you really want to be running untrusted code?
Reading the codepad.org/about page might give you some cool ideas.
http://codepad.org/about
Running under unprivileged user still allows a local attacker to exploit vulnerabilities to elevate privileges.
Allowing to execute code in a VM can be insecure as well; the attacker can gain access to host system, as recent VMWare vulnerability report has shown.
In my opinion, allowing running native code on your system in the first place is not a good idea from security point of view. Maybe you should reconsider allowing them to run native code, this will certainly reduce the risk.
Check out ulimit and friends for ways of limiting the underprivileged user's ability to DOS the machine.
Try learning a little about setting up policies for SELinux. If you're running a Red Hat box, you're good to go since they package it into the default distro.
This will be useful if you know the things to which the code should not have access. Or you can do the opposite, and only grant access to certain things.
However, those policies are complicated, and may require more investment in time than you may wish to put forth.
Use Ideone API - the simplest way.
try using lxc as a container for your apache server
Not sure about how much effort you want to put into this thing but could you run Xen like the VPS web hosts out there?
http://www.xen.org/
This would allow full root access on their little piece of the server without compromising the other users or the base system.
Related
I am trying to understand what are containers and what is their purpose?
I am a little bit confused. When I started to read about them I saw that they rely on the Linux namespaces (is it true?) - a way to isolate the process within the container from the other processes on the machine, and got the impression that their main purpose is security.
For instance, let's say that I own a server that runs multiple services. I also don't want that a single hacked service will be able to hack the whole system. So I put each service inside a container that will make the service unable to interfere the other processes inside the machine, like to kill them or to play with their memory and in that way eliminate the risk.
But later I saw other purposes like being able to ship the app easily? or something like that. so what is their main purpose? I also read that if their main purpose is security - they have a problem. because they run directly on the host kernel (again, is it true?)- and an exploit like the "dirty cow" will or was able to get out of the container and be able to corrupt the machine. So I ended reading about the gVisor - which from what I understood tries to secure the containers, and in some cases succeed. So - what does gVisor do differently? that it's able to secure the containers? is gVisor a container itself? or just a Runtime environment for containers?
eventually, I always see comparisons between containers and VM and I ask why? And when should I use them?
I don't know if anything that I wrote is correct, and I will be glad if you will point out my mistakes, and answer my questions. Yes, I know that there are a lot of them and I am sorry, but Thanks!
The answer below is not guaranteed to be concise. Anyone is welcomed to point out my mistakes.
It might be a little bit vague because many people mixed such concepts nowadays.
1. LXC
When I first got to know such concepts, container still meant LXC, a long-existed technique in Linux. IMHO, container is a complete process that does not simulate a kernel. The difference between a container and a normal process is that container provides a isolated view via cgroups, as if it was in a new operating system. But in fact, the containers still share the host kernel (you are right), so people do worry about the security, especially when you want to deploy it in a public cloud (I don't see people using LXC directly on public cloud yet).
Despite the potential insecurity, the convenience and lightweightness(fast boot, small memory fingerprint) of containers seem to outweigh its drawbacks in most of security-insensitive situations. Tools like docker and kubernetes make large-scale deployment and management more efficient.
2. Virtual Machine & Hardware-assisted virtualization
In contrast to container, the concept Virtual Machine represents another category of isolated execution environment. Considering that most of VMs leverages some hardware-accelerating techniques like VT-x, I will assume you are talking about hardware-assisted virtualization. Virtual Machine usually contains a full kernel inside it.
See this picture from Doug Chamberlain
The Intel VT-x technique provides 2 modes, root mode(privileged) and non-root mode(not privileged). Each mode has its own ring0-ring3 (e.g, non-root ring3, non-root ring0, root ring3, root ring 0). The whole virtual machine runs in non-root mode, and the hypervisor(VMM, e.g., kvm) runs in root-mode.
In the classic qemu+kvm setup, qemu runs in root ring3, and kvm runs in root ring0.
The strong isolation and the existance of guest kernel makes virtual machine more secure and compatible. But, of course, the price is performance and efficiency (slower boot etc.)
Container-based Virtualization
People want the isolation of hardware-assisted virtualization, but don't want to give up the convenience of containers. Therefore, the hybrid solution seems really intuitive to come.
There are 2 typical solutions at present, Kata Container and [gVisor][6]
Kata Container tries to slim the whole stack of virtual machine to make it more lightweight. However, there is still linux inside it and it is still a virtual machine, but more lightweight.
gVisor claims to be an secure container, but it still leverages hardware virtualization techniques (or ptrace if you don't want virtualization). There is a component called sentry, which runs both in non-root ring0 and root ring3. The sentry will do part of the guest kernel's job, but is much smaller than linux. If sentry could not finish a request itself, it proxy the request down to the host kenrel.
The reason why most people believe gvisor is somewhat more secure is that it achieves "defense in depth" -- more layers of indirection lead people to believe it is more secure. This is usually true, but again, is not a guarantee.
I found this answer on learning Linux Kernel Programming and my question is more specific for the security features of the Linux Kernel. I want to know how to limit privileged users or process's access rights to other processes and files in contrast to full access of root.
Until now I found:
user and group for Discretionary Access Control (DAC), with differentiation in read, write and execute for user, group and other
user root for higher privileged tasks
setuid and setgid to extend users's DAC and set group/user ID of calling process, e.g. user run ping with root rights to open Linux sockets
Capabilities for fine-grained rights, e.g. remove suid bit of ping and set cap_net_raw
Control Groups (Cgroups) to limit access on resources i.e. cpu, network, io devices
Namespace to separate process's view on IPC, network, filesystem, pid
Secure Computing (Seccomp) to limit system calls
Linux Security Modules (LSM) to add additional security features like Mandatory Access Control, e.g. SELinux with Type Enforcement
Is the list complete? While writing the question I found fanotify to monitor filesystem events e.g. for anti virus scans. Probably there are more security features available.
Are there any more Linux security features which could be used in a programmable way from inside or outside of a file or process to limit privileged access? Perhaps there is a complete list.
The traditional unix way to limit a process that somehow needs more privileges and yet contain it so that it cannot use more than what it needs is to "chroot" it.
chroot changes the apparent root of a process. If done right, it can only access those resources inside that newly created chroot environment (aka. chroot jail)
e.g. it can only access those files, but also, only those devices etc.
To create a process that does this willingly is relatively easy, and not that uncommon.
To create an environment where an existing piece of software (e.g. a webserver, mailserver, ...) feels at home in and still functions properly is something that requires experience. The main thing is to find the minimal set of resources needed (shared libraries, configuration files, devices, dependent services (e.g. syslog), ... ).
You may add
EFS,AppArmor,Yama
auditctl,ausearch,aureport
Tools similar to fanotify:
Snort, ClamAV,OpenSSL,AIDE, nmap, GnuPG
In the docs they say:
The customers’ applications can now be run (using the process manager
of your choice, such as rc.local, Running uWSGI via Upstart,
Supervisord or whatever strikes your fancy) with a different uid and a
limited (if you want) address space for each socket:
Is it really necessary to do this? If yes - why?
It is a general security rule. Immagine one of your app being compromised, if it runs with the same permissions of the others it will be potentially able to damage them as well.
A Docker blog post indicates:
Docker containers are, by default, quite secure; especially if you
take care of running your processes inside the containers as
non-privileged users (i.e. non root)."
So, what is the security issue if I'm running as a root under the docker? I mean, it is quite secure if I take care of my processes as non-privileged users, so, how can I be harmful to host in a container as a root user? I'm just asking it to understand it, how can it be isolated if it is not secure when running as root? Which system calls can expose the host system then?
When you run as root, you can access a broader range of kernel services. For instance, you can:
manipulate network interfaces, routing tables, netfilter rules;
create raw sockets (and generally speaking, "exotic" sockets, exercising code that has received less scrutiny than good old TCP and UDP);
mount/unmount/remount filesystems;
change file ownership, permissions, extended attributes, overriding regular permissions (i.e. using slightly different code paths);
etc.
(It's interesting to note that all those examples are protected by capabilities.)
The key point is that as root, you can exercise more kernel code; if there is a vulnerability in that code, you can trigger it as root, but not as a regular user.
Additionally, if someone finds a way to break out of a container, if you break out as root, you can do much more damage than as a regular user, obviously.
You can reboot host machine by echoing to /proc/sysrq-trigger on docker. Processes running as root in docker can do this.
This seems quite good reason not to run processes as root in docker ;)
This is an imaginary example of what I like to do. Don't take it too literally.
Let say my process is being ran as www-data and I have a lua script called thedevil.lua. It will try to delete, corrupt and cause as much problems as possible. I'd like to fire up a process (or load a shared object) that has a lua interpreter and it will try to ruin all my websites as the user is www-data.
Is there a way I can say lets create this process (or load a library) with LIMITED permissions. Say the script is in /var/www/devilscript/thedevil.lua. I'd like to give it permissions for /tmp/www/devilscript and /var/www/devilscript/. Is that possible? I don't want to create a new user called devilscript and give it limited permissions than run the process as that user. I just want to say I am www-data but I only want to give this process/lib a subset of what I can do.
-edit- Could you give me the name of the functions to execute the said so or binary with lower permissions?
-edit2- Can windows do something like I asked?
Yes, depending on the operating system you are running on, there are various sorts of sandboxing methods available in modern Unix systems. It depends a bit on which one you are running. Under Linux there are almost too many -- SELinux, Apparmor, Tomoyo, and others. FreeBSD has a Mandatory Access Control System as well as the Capsicum capabilities system. Mac OS X has a sandboxing system as well.
Most such systems allow you to reduce the privilege that a particular process gets in a fairly granular manner. In general, capability systems are easier to work with than Mandatory Access Control (MAC) systems, but they are less frequently available.
A primitive way of doing this sort of privilege restriction in older Unix systems was "chrooting" a process, that is, running it in a restricted part of the file hierarchy using the chroot system call. Unfortunately, that remains the only truly portable form of privilege reduction available in Unix systems -- you thus encounter it in the configuration systems of many system daemons.
SELinux will allow you to create a domain that has restricted access to various file contexts and resources, regardless of the user the process is running as (even root).