Is there a way to restrict untrusted container scheduler? - security

I have an application which I'd like to give the privilege to launch short-lived tasks and schedule these as docker containers. I was thinking of doing this simply via docker run.
As I want to make the attack surface as small as possible, I treat the application as untrusted. As such it can potentially run arbitrary docker run commands (if the codebase contained bug or the container was compromised, input was improperly escaped somewhere etc.) against a predefined docker API endpoint.
This is why I'd like to restrict that application (effectively a scheduler) in some ways:
prevent --privileged use
enforce --read-only flag
enforce memory & CPU limits
I looked at couple of options:
selinux
the selinux policies would need to be set on the host level and then propagated inside the containers via --selinux-enabled flag on the daemon level. The scheduler can however override this anyway via run --privileged.
seccomp profiles
these are only applied at a time of launching the container (seccomp flags are available for docker run)
AppArmor
this can (again) be overriden on the scheduler level via --privileged
docker daemon --exec-opts flag
only a single option is actually available for this flag (native.cgroupdriver)
It seems that Docker is designed to trust container schedulers by default.
Does anyone know if this is a design decision?
Is there any other possible solution available w/ current latest Docker version that I missed?
I also looked at Kubernetes and its Limit Ranges & Resource Quotas which can be applied to K8S namespaces, which looked interesting, assuming there's a way to enforce certain schedulers to only use certain namespaces. This would however increase the scope of this problem to operating K8S cluster.

running docker on a unix platform should be compatible with nice Or so I would think at first looking a little more closely it looks like you need somethign like -cpuset-cpus="0,1"
From the second link , "The --cpu-quota looks to be similar to the --cpuset-cpus ... allocate one or a few cores to a process, it's just time managed instead of processor number managed."

Related

Docker containers instead of multiprocessing

One of the main application of Docker containers is load-balancing. For example, in the case of a web application, instead of having only one instance handling all requests, we have many containers doing exactly the same thing, but the requests are split toward all of these instances.
But can it be used to do the same service, but with different "parameters"?
For instance, let's suppose I want to create a platform storing crypto-currency data from different exchange platforms (Bitfinex, Bittrex, etc.).
A lot of these platforms are handling web sockets. So in order to create one socket per platform, I would do something at the "code layer" like (language agnostic):
foreach (platform in platforms)
client = createClient(platform)
socket = client.createSocket()
socket.GetData()
Now of course, this loop would be stuck on the first iteration, because the websocket is waiting (although I could use asynchrony, anyway). To circumvent that, I could use multiprocessing, something like:
foreach (platform in platforms)
client = createClient(platform)
socket = client.createSocket()
process = new ProcessWhichGetData(socket)
process.Launch()
Is there any way to do that at a "Docker layer", I mean to use Docker to make the different containers handling different platforms? I would have one Docker container for Bittrex, one Docker container for Bitfinex, etc.
I know this would imply that either the different containers would communicate between each other (who takes care of Bitfinex? who takes care of Bittrex?), or the container orchestrator (Docker Swarm / Kubernete) would handle itself this "repartition".
Is it something we could do, and, on top of that, is it something we want?
Docker containerization simply adds various layers of isolation around regular user-land processes. It does not by itself introduces coordination among several processes, though it certainly can be exploited in building a multi-process system where each process perform some jobs, no matter if these jobs are redundant or complementary.
If you can design your solution so that one process is launched for each "platform" (for example, passing the specific platform an instance should handle as a command line parameter), then indeed, this can technically be done in Docker.
I should however point out that it is not clear why you would want to run each process in a distinct container. Is isolation pertinent for security reasons? For resource accounting? To have each process dispatched to a distinct host in order to have access to more processing power? Also, is there coordination required among these processes, outside of the having to initially determine which process handle which platform? If so, do they need to have access to a shared storage, or be able to send signals to each others? These questions will help you decide how to approach the dockerization of your solution.
In the most simple case, assuming that all you want is to have the whole process be isolated from the rest of the system, but with no requirement that these processes be isolated from each other, then the most simple strategy would simply to have a single container that contains an entrypoint shell script, which will simply launch one process per platform.
entrypoint.sh (inside your docker image):
#!/bin/bash
platforms=Bitfinex Bittrex
for platform in ${platforms} ; do
./myprogram "${platform}" &
done
If you really need a distinct container for each platform, then you would use a similar script, but this time, it would be run directly on the host machine (that is, outside of a container), and would encapsulate each process inside a docker container.
launch.sh (directly on the host):
#!/bin/bash
for platform in ${platforms} ; do
docker -name "program_${platform}" my_program_docker \
/usr/local/bin/myprogram "$platform"
done
Alternatively, you could use docker-compose to define the list of docker containers to be launched, but I will not discuss more this option at present (just ask if this seems pertinent to your you case).
If you need containers to be distributed among several host machines, then that same loop could be used, but this time, processes would be launched using docker-machine. Alternatively, if using docker-compose, the processes could be distributed using Swarm.
Say you restructured this as a long-running program that handled only one platform at a time, and controlled which platform it was via a command-line option or an environment variable. Instead of having your "launch all the platforms" loop in code, you might write a shell script like
#!/bin/sh
for platform in $(cat platforms.txt); do
./run_platform $platform &
done
This setup is easy to transplant into Docker.
You should not plan on processes launching Docker containers dynamically. This is hard to set up and has significant security implications (by which I mean "a bug in your container launcher could easily root your host").
If the individual processing tasks can all run totally independently (maybe they use a shared database to store data) then you're basically done. You could replace that shell script with something like a Docker Compose YAML file that lists out all of the containers; if you want to run this on multiple hosts you can use tools like Ansible, or Docker Swarm, or Kubernetes to spread the containers out (with varying levels of infrastructure complexity).
You can bunch the different docker containers in a STACK and also configure networking so that the docker containers can remain isolated form the outside world but can communicate with each other.
More info here Docker Stack

How to create a docker image of current file and OS system?

I wonder if one can take all the current environment variables settings OS applications and create a simple docker layer on top of it all so that docker container user will not be able to damage host system even if he would remove all files, yet will have abilety to access all installed applications and system settings inside his docker layer?
Technically you might be able to hack together a solution that does this by copying in all data/apps, installing dependencies, re-configuring the applications and providing a bash shell to attach to for a user to play around with but this is not what Docker is designed for at all, not to mention that I would not recommend anyone to attempt this.
I always try to explain docker's usecase as processes which run in isolated containers with defined interfaces that may be exposed. Meaning you would ideally run one application within it which has an interface exposed for communication.
What you are looking for is essentially a VM with snapshots which you can provide to different users.

Using multiple docker containers on the same host securely like isolated instances

I know, multiple Docker containers can be used in the same host, but can they be used securely like isolated instances? I want to run multiple secure and sandboxed containers such that no container can affect or access others.
For instance, can I serve nginx and apache containers which listen to different ports, with full trust that each container can only access their own files, resources etc?
In some sense you are asking the million dollar question with containers, and to be clear, IMHO there is no black and white answer to the question "is the platform/technology secure enough." It is a big (and important) enough question that the list of startups--not to mention amount of funding they've received--around container security is an appreciable number!
As noted in another answer, isolation for containers is realized through an assortment of Linux kernel capabilities (namespaces and cgroups), and adding more security to these capabilities is yet another set of technologies like seccomp, apparmor (or SELinux), user namespaces, or general hardening of the container runtime & node it is installed on (e.g. via the CIS benchmark guidelines). Out of the box default installation and default runtime parameters are probably not good enough for generically trusting in the kernel isolation primitives of Linux. However, this depends greatly on the trust level of what you are running across your container workloads. For example, is this all in-house within one organization? Can workloads be submitted from external sources? Obviously the spectrum of possibilities may greatly impact your level of trust.
If your use case is potentially narrow (for example, you mention web serving content from nginx or apache), and you are willing to do some work to handle base image creation, minimization and hardening; add to that a --readonly root filesystem and a capability limiting apparmor and seccomp profile, bind mount in the content served + writeable area, with no executables and ownership by an unprivileged user--all those things together might be enough for a specific use case.
However, there is no guarantee that a currently unknown security escape becomes a "0day" for Linux containers in the future, and that has led to promotion of lightweight virtualization that marries container isolation with actual hardware-level virtualization through shims from hyper.sh or Intel Clear Containers, as two examples. This is a happy medium between running a full virtualized OS with another container runtime and trusting kernel isolation with a single daemon on a single node. There is still a performance cost and memory overhead to adding this layer of isolation, but it is much less than a fully virtualized OS and work continues to make this less of a performance impact.
For a deeper set of information on all the "knobs" available for tuning container security, a presentation I gave last year several times is available on slideshare as well as via video from Skillsmatter.
The incredibly thorough "Understanding and Hardening Linux Containers" by Aaron Grattafiori is also a great resource with exhaustive detail on many of the same topics.
filesystem isolation (as well as memory and processes isolation) is a core feature of docker containers, based on the Linux Kernel abilities.
But if you wanted to be completely sure, you would deploy your containers on different nodes (each managed by their own docker daemons), each node being a VM (Virtual Machine) on your host, ensuring a complete sandbox.
Then a docker swarm or Kubernetes would be able to orchestrate those node and their containers, and make them communicate.
This is normally not needed when you have just a few linked containers: their should be able to be managed in isolation by a single docker daemon. You could use user namespace for additional isolation.
Plus, using nodes to separate containers implies different machines or different VM within the same machine.
And one big difference with a VM and a container is that a VM will preempt resources (allocate a fix minimal amount of disk/memory/CPU), which means you cannot launch an hundred VM, one per container. As opposed to a single docker instance, where a container, if it does nothing, won't consume much disk space/memory/CPU at all.

Disable certain Docker run options

I'm currently working on a setup to make Docker available on a high performance cluster (HPC). The idea is that every user in our group should be able to reserve a machine for a certain amount of time and be able to use Docker in a "normal way". Meaning accessing the Docker Daemon via the Docker CLI.
To do that, the user would be added to the Docker group. But this imposes a big security problem for us, since this basically means that the user has root privileges on that machine.
The new idea is to make use of the user namespace mapping option (as described in https://docs.docker.com/engine/reference/commandline/dockerd/#/daemon-user-namespace-options). As I see it, this would tackle our biggest security concern that the root in a container is the same as the root on the host machine.
But as long as users are able to bypass this via --userns=host , this doesn't increase security in any way.
Is there a way to disable this and other Docker run options?
As mentioned in issue 22223
There are a whole lot of ways in which users can elevate privileges through docker run, eg by using --privileged.
You can stop this by:
either not directly providing access to the daemon in production, and using scripts,
(which is not what you want here)
or by using an auth plugin to disallow some options.
That is:
dockerd --authorization-plugin=plugin1
Which can lead to:

docker and product versions

I am working for a product company and we do make lot of releases of the product. In the current approach to test multiple releases, we create separate VM and install all infrastructure softwares(db, app server etc) on top of it. Later we deploy the application WARs on the respective VM. Recently, I came across docker and it seems to be much helpful. Hence I started exploring it with the examples listed on the site. But, I am not able to find a way as how docker can be applied to build environment suitable to various releases?
Each product version will have db schema changes.
Each application WARs will have enhancements/defects etc.
Consider below example.
Every month, our company is releasing a new version of software and hence in order to support/fix defects we create VMs per release. Given the fact that if the application's overall size is 2 gb and OS takes close to 5 gb (apart from space it will also take up system resources for extra overhead). The VMs are required to restore any release and test any support issues reported against it. But looking at the additional infrastructure requirements, it seems that its very costly affair.
Can docker have everything required to run an application inside a container/image?
Can docker pack an application which consists of multiple WARs/DB schemas and when started allocate appropriate port?
Will there be any space/memory/speed differences compared to VM and docker assuming above scenario?
Do you think docker is still appropriate solution or should we continue using VMs? Can someone share pointers on how I can achieve above requirements with docker?
tl;dr: Yes, docker can run most applications inside a container.
Docker runs a single process inside each container. When using VMs or real servers, this one process is usually the init system which starts all system services. With docker it is usually your app.
This difference will get you faster startup times for your app (not starting the whole operating system). The trade off is that, if you depend on system services (such as cron, sshd…) you will need to start them yourself. There are some base images that provide a more "VM-like" environment… check phusion's baseimage for instance. To start more than a single process, you can also use a process manager such as supervisord.
Going forward, the recommended (although not required) approach is to start one process in each container (one per application server, one per database server, and so on) and not use containers as VMs.
Docker has no problems allocating ports either. It even has an explicit command on the Dockerfile: EXPOSE. Exposed ports can also be published on the docker host with the --publish argument of run so you don't even need to know the IP assigned to the container.
Regarding used space, you will probably see important savings. Docker images are created by stacking filesystem layers… this means that the common layers are only stored once on the server. In your setup, you will likely only have one copy of the base operating system layer (with VMs, you have a copy on each VM).
On memory you will probably see less significant savings (mostly caused by not starting all the operating system services). Speed is still a subject of research… A few things clear so far is that for faster IO you will need to use docker volumes and that for network heavy use cases you should use host networking. Check the IBM research "An Updated Performance Comparison of Virtual Machines and Linux Containers" for details. Or a summary like InfoQ's.

Resources