Kubernetes pod security policies uid/gid ranges - security

I need to allow the ranges 0-1000 , and 6000-7000 to be used for application deployments , and forbid all others.
Does this configuration will prevent someone to exec into pod/container and can switch to some other uid/gid ?

Linux does not normally permit non-root users to exec as other UID/GIDs without something like sudo. As long as you also limit capabilities, privileged, privilege escalation and unsafe mount types, you can be fairly certain your pods will only run with processes as the UID/GIDs that you specify.

Related

k8s check pod securityContext definition

The bounty expires in 5 days. Answers to this question are eligible for a +50 reputation bounty.
PJEM is looking for an answer from a reputable source:
I need to know which combination of security context config can have a security escalation, and also can control other pods in the cluster
I want to check if pod in the cluster running as privileged pods, which can indicate that we may have security issue, so I check if
privileged: true
However under the
securityContext: spec there is additional fields like
allowPrivilegeEscalation
RunAsUser
ProcMount
Capabilities
etc
Which may be risky (not sure about it) ,
My question is in case the pod is marked as privileged:false and the other fields are true like the following example,if this indicate some security issue ? Does this pods can do some operation on other pods etc , access external data?
For example the following configuration which indicate the the pod is not privileged but allowPrivilegeEscalation: true
securityContext:
allowPrivilegeEscalation: true
privileged: false
I want to know which securityContext combination of pod config can control other pods/process in the cluster ?
The securityContext are more related to the container itself and some access to the host machine.
The allowPrivilegeEscalation allow a process to gain more permissions than its parent process. This is more related to setuid/setgid flags in binaries, but inside a container there is no much to get worried about.
You can only control other containers in the host machine from inside a container if you have a hostPath volume, or something like that, allowing you to reach the .sock file as /run/crio/crio.sock or the docker.sock. Is pretty obvious that, if you are concerned about this, allowing requests to Docker API through the network should be disabled.
Of course, all of these access are ruled by DAC and MAC restrictions. This is why podman uidmap is better, because the root inside the container do not have the same root id outside the container.
From Kubernetes point of view, you don't need this kind of privileges, all you need is a ServiceAccount and the correct RBAC permissions to control other things inside Kubernetes. A ServiceAccount binded to a cluster-admin ClusterRole can do anything in the API and much more, like adding ssh keys to the hosts.
If you are concerned about pods executing things in Kubernetes or in the host, just force the use of nonRoot containers, avoid indiscriminate use of hostPath volumes, and control your RBAC.
Openshift uses a very nice restriction by default:
Ensures that pods cannot run as privileged
Ensures that pods cannot mount host directory volumes
Requires that a pod is run as a user in a pre-allocated range of UIDs (openshift feature, random uid)
Requires that a pod is run with a pre-allocated MCS label (selinux related)
I don't answer exactly what you want, because I shifted the attention to RBAC, but I hope this can give you a nice idea.
Strictly in the scope of securityContext (as of Kubernetes 1.26 API), here's few things that may be risky:
Certainly risky
capabilities.add will add Linux capabilities (like CAP_SYS_TIME to set system time) to a container. The default depends on container runtime (see for example Docker default set of capabilities) and should be reasonably secure, but adding capabilities like CAP_SYS_ADMIN may represent a risk. Excessive capabilities outlines a few possible escalations.
privileged: true grants all capabilities, so you'll definitely want to check for that (as you already do).
allowPrivilegeEscalation: true is risky as it allow a process to gain more privileges than its parent.
procMount will allow a container mounting node's /proc and expose sensible host information.
windowsOptions may be risky. According to Kubernetes doc it enables privileged access to the Windows node. I don't know much about Windows security, but I'd say risky :-)
Maybe risky (though usually intended to restrict permissions)
runAsGroup and runAsUser may be risky when set to root / 0. Given that by default container runtime will probably run container as root already it's mostly used to restrict container's permissions to a non-root user. But if your container runtime is configured to run container as non-root by default, it might be used to bypass that and run a container with root.
seLinuxOptions may be used to provide an insecure SELinux context, but is usually intended to define a more secure context.
seccompProfile defines system calls a container is allowed to make. It may be used to get access to sensitive system calls, though it's usually intended to restrict them.
(probably) Not risky
readOnlyRootFilesystem (default false) will make container root system read-only.
runAsNonRoot (default false) will prevent a container from running as root
capabilities.drop will drop Linux capabilities, restricting further what a container can do.
You can read more on Configure a Security Context official doc
What about non-Security Context related risks?
Security Context is not the only thing you should be wary about: you should also consider volume mounts to unsecure locations, RBAC, network, Secrets, etc. A good overview is provided by Security Checklist.

Using runAsNonRoot in Kubernetes

We’ve been planning for a long time to introduce securityContext: runAsNonRoot: true as a requirement to our pod configurations for a while now.
Testing this today I’ve learnt that since v1.8.4 (I think) you also have to specify a particular UID for the user running the container, e.g runAsUser: 333.
This means we not only have to tell developers to ensure their containers don’t run as root, but also specify a specific UID that they should run as, which makes this significantly more problematic for us to introduce.
Have I understood this correctly? What are others doing in this area? To leverage runAsNonRoot is it now required that Docker containers run with a specific and known UID?
The Kubernetes Pod SecurityContext provides two options runAsNonRoot and runAsUser to enforce non root users. You can use both options separate from each other because they test for different configurations.
When you set runAsNonRoot: true you require that the container will run with a user with any UID other than 0. No matter which UID your user has.
When you set runAsUser: 333 you require that the container will run with a user with UID 333.
What are others doing in this area?
We are using runAsUser in situations where we don't want root to be used. Granted, those situations are not that frequent as you might think, since philosophy of deployment of 'processes' as separated pod's containers inside kubernetes cluster architecture differs from traditional compound monolithic deployment on single host where security implications of breach are quite different...
Most of our local development is done either on minicube or docker edge with k8s manifests so setup is as close as possible to our deployment cluster (apart from obvious limits). With that said, we don't have issues with user id allocation since initialization of persistent volume is not done externally so all file user/group ownership is done within pods with proper file permissions. On very rare occasions that docker is used for development, developer is instructed to set appropriate permissions manually across mounted volumes but that rare happens.

Disable certain Docker run options

I'm currently working on a setup to make Docker available on a high performance cluster (HPC). The idea is that every user in our group should be able to reserve a machine for a certain amount of time and be able to use Docker in a "normal way". Meaning accessing the Docker Daemon via the Docker CLI.
To do that, the user would be added to the Docker group. But this imposes a big security problem for us, since this basically means that the user has root privileges on that machine.
The new idea is to make use of the user namespace mapping option (as described in https://docs.docker.com/engine/reference/commandline/dockerd/#/daemon-user-namespace-options). As I see it, this would tackle our biggest security concern that the root in a container is the same as the root on the host machine.
But as long as users are able to bypass this via --userns=host , this doesn't increase security in any way.
Is there a way to disable this and other Docker run options?
As mentioned in issue 22223
There are a whole lot of ways in which users can elevate privileges through docker run, eg by using --privileged.
You can stop this by:
either not directly providing access to the daemon in production, and using scripts,
(which is not what you want here)
or by using an auth plugin to disallow some options.
That is:
dockerd --authorization-plugin=plugin1
Which can lead to:

Is there a way to restrict untrusted container scheduler?

I have an application which I'd like to give the privilege to launch short-lived tasks and schedule these as docker containers. I was thinking of doing this simply via docker run.
As I want to make the attack surface as small as possible, I treat the application as untrusted. As such it can potentially run arbitrary docker run commands (if the codebase contained bug or the container was compromised, input was improperly escaped somewhere etc.) against a predefined docker API endpoint.
This is why I'd like to restrict that application (effectively a scheduler) in some ways:
prevent --privileged use
enforce --read-only flag
enforce memory & CPU limits
I looked at couple of options:
selinux
the selinux policies would need to be set on the host level and then propagated inside the containers via --selinux-enabled flag on the daemon level. The scheduler can however override this anyway via run --privileged.
seccomp profiles
these are only applied at a time of launching the container (seccomp flags are available for docker run)
AppArmor
this can (again) be overriden on the scheduler level via --privileged
docker daemon --exec-opts flag
only a single option is actually available for this flag (native.cgroupdriver)
It seems that Docker is designed to trust container schedulers by default.
Does anyone know if this is a design decision?
Is there any other possible solution available w/ current latest Docker version that I missed?
I also looked at Kubernetes and its Limit Ranges & Resource Quotas which can be applied to K8S namespaces, which looked interesting, assuming there's a way to enforce certain schedulers to only use certain namespaces. This would however increase the scope of this problem to operating K8S cluster.
running docker on a unix platform should be compatible with nice Or so I would think at first looking a little more closely it looks like you need somethign like -cpuset-cpus="0,1"
From the second link , "The --cpu-quota looks to be similar to the --cpuset-cpus ... allocate one or a few cores to a process, it's just time managed instead of processor number managed."

What are the potential security problems running untrusted code in a Docker container as a non-root user?

I've seen plenty of ink spilled by now about how Docker is not sufficiently isolated to allow arbitrary containers to be run in a multi-tenant environment, and that makes sense. "If it's root in Docker, consider it root in the host machine." What about non-root though?
If I want to take some untrusted code and run it in a container, can it be done safely so long as the container is running as a non-root non-sudo user? What are the potential security pitfalls of doing something like that?
I'm fairly sure there are production applications doing this today (CI systems, runnable pastebins), but are they just lucky not to have had a determined attacker or is this a reasonable thing to do in a production system?
As of Docker v1.12, if one runs a container as a non-root user with user namespaces enabled, there are two levels of privilege escalation a malicious actor needs to perform in order to become root on host:
Escalate from non-root to root user inside container
Escalate to root user in container to root user on the host
So in case untrusted code is run inside a Docker container as non-root user, it will be slightly more difficult for an attacker to become root on host, since we add an extra step of becoming root inside container. That's the only advantage in terms of security compared to running containers with root privileges.
In case of privilege escalation through both layers of security, following should help restrict the attack surface:
Workloads(more specifically docker containers, in this context) with different trust levels should be isolated from each other by use of overlay networks following least privilege principle.
Enabling available Linux security module in enforcement mode(e.g. SELinux, AppArmor)
References:
Running with non-root privileges inside containers: https://groups.google.com/forum/#!msg/docker-user/e9RkC4y-21E/JOZF8H-PfYsJ
Overlay networks: https://docs.docker.com/engine/userguide/networking/get-started-overlay/
User namespaces: https://docs.docker.com/engine/security/security/#/other-kernel-security-features
All containers share the same kernel.
In case your un-trusted code manages to perform a kernel exploit, it can do whatever it wants on the host and/or any other running container.

Resources