k8s check pod securityContext definition - linux

The bounty expires in 5 days. Answers to this question are eligible for a +50 reputation bounty.
PJEM is looking for an answer from a reputable source:
I need to know which combination of security context config can have a security escalation, and also can control other pods in the cluster
I want to check if pod in the cluster running as privileged pods, which can indicate that we may have security issue, so I check if
privileged: true
However under the
securityContext: spec there is additional fields like
allowPrivilegeEscalation
RunAsUser
ProcMount
Capabilities
etc
Which may be risky (not sure about it) ,
My question is in case the pod is marked as privileged:false and the other fields are true like the following example,if this indicate some security issue ? Does this pods can do some operation on other pods etc , access external data?
For example the following configuration which indicate the the pod is not privileged but allowPrivilegeEscalation: true
securityContext:
allowPrivilegeEscalation: true
privileged: false
I want to know which securityContext combination of pod config can control other pods/process in the cluster ?

The securityContext are more related to the container itself and some access to the host machine.
The allowPrivilegeEscalation allow a process to gain more permissions than its parent process. This is more related to setuid/setgid flags in binaries, but inside a container there is no much to get worried about.
You can only control other containers in the host machine from inside a container if you have a hostPath volume, or something like that, allowing you to reach the .sock file as /run/crio/crio.sock or the docker.sock. Is pretty obvious that, if you are concerned about this, allowing requests to Docker API through the network should be disabled.
Of course, all of these access are ruled by DAC and MAC restrictions. This is why podman uidmap is better, because the root inside the container do not have the same root id outside the container.
From Kubernetes point of view, you don't need this kind of privileges, all you need is a ServiceAccount and the correct RBAC permissions to control other things inside Kubernetes. A ServiceAccount binded to a cluster-admin ClusterRole can do anything in the API and much more, like adding ssh keys to the hosts.
If you are concerned about pods executing things in Kubernetes or in the host, just force the use of nonRoot containers, avoid indiscriminate use of hostPath volumes, and control your RBAC.
Openshift uses a very nice restriction by default:
Ensures that pods cannot run as privileged
Ensures that pods cannot mount host directory volumes
Requires that a pod is run as a user in a pre-allocated range of UIDs (openshift feature, random uid)
Requires that a pod is run with a pre-allocated MCS label (selinux related)
I don't answer exactly what you want, because I shifted the attention to RBAC, but I hope this can give you a nice idea.

Strictly in the scope of securityContext (as of Kubernetes 1.26 API), here's few things that may be risky:
Certainly risky
capabilities.add will add Linux capabilities (like CAP_SYS_TIME to set system time) to a container. The default depends on container runtime (see for example Docker default set of capabilities) and should be reasonably secure, but adding capabilities like CAP_SYS_ADMIN may represent a risk. Excessive capabilities outlines a few possible escalations.
privileged: true grants all capabilities, so you'll definitely want to check for that (as you already do).
allowPrivilegeEscalation: true is risky as it allow a process to gain more privileges than its parent.
procMount will allow a container mounting node's /proc and expose sensible host information.
windowsOptions may be risky. According to Kubernetes doc it enables privileged access to the Windows node. I don't know much about Windows security, but I'd say risky :-)
Maybe risky (though usually intended to restrict permissions)
runAsGroup and runAsUser may be risky when set to root / 0. Given that by default container runtime will probably run container as root already it's mostly used to restrict container's permissions to a non-root user. But if your container runtime is configured to run container as non-root by default, it might be used to bypass that and run a container with root.
seLinuxOptions may be used to provide an insecure SELinux context, but is usually intended to define a more secure context.
seccompProfile defines system calls a container is allowed to make. It may be used to get access to sensitive system calls, though it's usually intended to restrict them.
(probably) Not risky
readOnlyRootFilesystem (default false) will make container root system read-only.
runAsNonRoot (default false) will prevent a container from running as root
capabilities.drop will drop Linux capabilities, restricting further what a container can do.
You can read more on Configure a Security Context official doc
What about non-Security Context related risks?
Security Context is not the only thing you should be wary about: you should also consider volume mounts to unsecure locations, RBAC, network, Secrets, etc. A good overview is provided by Security Checklist.

Related

Kubernetes: privileged containers and security concerns

Running a container in privileged mode is discouraged for security reasons.
For example: https://www.cncf.io/blog/2020/10/16/hack-my-mis-configured-kubernetes-privileged-pods/
It seems obvious to me that is is preferable to avoid privileged containers when a non-privileged container instead would be sufficient.
However, let's say I need to run a service that requires root access on the host to perform some tasks. Is there an added security risk in running this service in a privileged container (or with some linux capabilities) rather than, for example, a daemon that runs as root (or with those same linux capabilities)? What is the added attack surface?
If a hacker manages to run a command in the context of the container, all right, it is game over. But what kind of vulnerability would allow him to do so that couldn't also be exploited in the case of the aforementioned daemon (apart from sharing the kubeconfig file thoughtlessly)?
Firstly and as you said, it is important to underline that running a container in privileged mode is highly discouraged for some obvious security reasons and here is why:
The risk of running a privileged container lies in the fact that it has access to the host's resources, including the ability to modify the host's system files, access sensitive information, and gain elevated privileges. Basically as it provides more permissions to the container than it would have in a non-privileged mode it significantly increase the risk of a attack surface.
If a hacker gains access to the privileged container, he can potentially access and manipulate the host system and potentially move laterally to other systems and compromising the security of the entire of your infrastructure. A similar vulnerability in a daemon running as root or with additional Linux capabilities would carry the same risk, as the hacker would have access to the same resources and elevated privileges.
In both cases, it is very important to follow best practices for securing the system, such as reducing the attack surface, implementing least privilege, and maintaining proper network segmentation to reduce the risk of compromise.
In this security article written by the astra security team they have mentioned PHP Remote Code Execution Vulnerability (2020) using which the attacker can get hold of your server. If this process is being run by a non root user the attack surface will be reduced but if the same service is having root user access the attacker can get access to remaining containers. This is the reason why it’s always preferred to have least privileged access configured for all the services, also go through this document for getting an overview on attacks that can be performed using privileged containers.

Using runAsNonRoot in Kubernetes

We’ve been planning for a long time to introduce securityContext: runAsNonRoot: true as a requirement to our pod configurations for a while now.
Testing this today I’ve learnt that since v1.8.4 (I think) you also have to specify a particular UID for the user running the container, e.g runAsUser: 333.
This means we not only have to tell developers to ensure their containers don’t run as root, but also specify a specific UID that they should run as, which makes this significantly more problematic for us to introduce.
Have I understood this correctly? What are others doing in this area? To leverage runAsNonRoot is it now required that Docker containers run with a specific and known UID?
The Kubernetes Pod SecurityContext provides two options runAsNonRoot and runAsUser to enforce non root users. You can use both options separate from each other because they test for different configurations.
When you set runAsNonRoot: true you require that the container will run with a user with any UID other than 0. No matter which UID your user has.
When you set runAsUser: 333 you require that the container will run with a user with UID 333.
What are others doing in this area?
We are using runAsUser in situations where we don't want root to be used. Granted, those situations are not that frequent as you might think, since philosophy of deployment of 'processes' as separated pod's containers inside kubernetes cluster architecture differs from traditional compound monolithic deployment on single host where security implications of breach are quite different...
Most of our local development is done either on minicube or docker edge with k8s manifests so setup is as close as possible to our deployment cluster (apart from obvious limits). With that said, we don't have issues with user id allocation since initialization of persistent volume is not done externally so all file user/group ownership is done within pods with proper file permissions. On very rare occasions that docker is used for development, developer is instructed to set appropriate permissions manually across mounted volumes but that rare happens.

Is there a way to restrict untrusted container scheduler?

I have an application which I'd like to give the privilege to launch short-lived tasks and schedule these as docker containers. I was thinking of doing this simply via docker run.
As I want to make the attack surface as small as possible, I treat the application as untrusted. As such it can potentially run arbitrary docker run commands (if the codebase contained bug or the container was compromised, input was improperly escaped somewhere etc.) against a predefined docker API endpoint.
This is why I'd like to restrict that application (effectively a scheduler) in some ways:
prevent --privileged use
enforce --read-only flag
enforce memory & CPU limits
I looked at couple of options:
selinux
the selinux policies would need to be set on the host level and then propagated inside the containers via --selinux-enabled flag on the daemon level. The scheduler can however override this anyway via run --privileged.
seccomp profiles
these are only applied at a time of launching the container (seccomp flags are available for docker run)
AppArmor
this can (again) be overriden on the scheduler level via --privileged
docker daemon --exec-opts flag
only a single option is actually available for this flag (native.cgroupdriver)
It seems that Docker is designed to trust container schedulers by default.
Does anyone know if this is a design decision?
Is there any other possible solution available w/ current latest Docker version that I missed?
I also looked at Kubernetes and its Limit Ranges & Resource Quotas which can be applied to K8S namespaces, which looked interesting, assuming there's a way to enforce certain schedulers to only use certain namespaces. This would however increase the scope of this problem to operating K8S cluster.
running docker on a unix platform should be compatible with nice Or so I would think at first looking a little more closely it looks like you need somethign like -cpuset-cpus="0,1"
From the second link , "The --cpu-quota looks to be similar to the --cpuset-cpus ... allocate one or a few cores to a process, it's just time managed instead of processor number managed."

What are the potential security problems running untrusted code in a Docker container as a non-root user?

I've seen plenty of ink spilled by now about how Docker is not sufficiently isolated to allow arbitrary containers to be run in a multi-tenant environment, and that makes sense. "If it's root in Docker, consider it root in the host machine." What about non-root though?
If I want to take some untrusted code and run it in a container, can it be done safely so long as the container is running as a non-root non-sudo user? What are the potential security pitfalls of doing something like that?
I'm fairly sure there are production applications doing this today (CI systems, runnable pastebins), but are they just lucky not to have had a determined attacker or is this a reasonable thing to do in a production system?
As of Docker v1.12, if one runs a container as a non-root user with user namespaces enabled, there are two levels of privilege escalation a malicious actor needs to perform in order to become root on host:
Escalate from non-root to root user inside container
Escalate to root user in container to root user on the host
So in case untrusted code is run inside a Docker container as non-root user, it will be slightly more difficult for an attacker to become root on host, since we add an extra step of becoming root inside container. That's the only advantage in terms of security compared to running containers with root privileges.
In case of privilege escalation through both layers of security, following should help restrict the attack surface:
Workloads(more specifically docker containers, in this context) with different trust levels should be isolated from each other by use of overlay networks following least privilege principle.
Enabling available Linux security module in enforcement mode(e.g. SELinux, AppArmor)
References:
Running with non-root privileges inside containers: https://groups.google.com/forum/#!msg/docker-user/e9RkC4y-21E/JOZF8H-PfYsJ
Overlay networks: https://docs.docker.com/engine/userguide/networking/get-started-overlay/
User namespaces: https://docs.docker.com/engine/security/security/#/other-kernel-security-features
All containers share the same kernel.
In case your un-trusted code manages to perform a kernel exploit, it can do whatever it wants on the host and/or any other running container.

containers and host user space shared when created using virsh

I'm trying to setup a container in redhat. The container should also run redhat version same as that of host. While exploring about these, I came across virsh and docker. Virsh supports host based containers and shares user space with host machine. Here I got confused with user space. Whether it mean filesystem space or some thing else. Can anyone clarify me on this? Also in which scenarios/cases virsh(host based container) can be used so that I can conclude whether its better to use virsh or docker. In my case i need to set up a redhat container in redhat host and run multiple instances of same process in each container. The containers should exchange data across each other without using network interface.
This should help clarify: http://rhelblog.redhat.com/2015/07/29/architecting-containers-part-1-user-space-vs-kernel-space/
It sounds like you really want to use Docker with -v bind mounts to share data. That is an article for a future day :-)
https://docs.docker.com/userguide/dockervolumes/
Current Kernels do not support yet the user namespace.
This is a known limitation of current containerization solutions. Unfortunately, usernamespace was implemented in latest kernel releases (staring from kernel 3.8) http://kernelnewbies.org/Linux_3.8 though it is not yet enabled in many mainstream distributions.
This is one of the strongest limitations of containers right now, if you are root (ID 1) in a container, you are root across the machine operating the container.
This is a problem affecting any product based on LXC though there is a strong push to fix this. It is actually a needed thing!
Alternatives is to go for hard Selinux jailing or work with underprivileged users accounts and assigning different users per container.
From Libvirt documentation https://libvirt.org/drvlxc.html:
User and group isolation
If the guest configuration does not list any ID mapping, then the user and group IDs used inside the container will match those used outside the container. In addition, the capabilities associated with a process in the container will infer the same privileges they would for a process in the host. This has obvious implications for security, since a root user inside the container will be able to access any file owned by root that is visible to the container, and perform more or less any privileged kernel operation. In the absence of additional protection from sVirt, this means that the root user inside a container is effectively as powerful as the root user in the host. There is no security isolation of the root user.
The ID mapping facility was introduced to allow for stricter control over the privileges of users inside the container. It allows apps to define rules such as "user ID 0 in the container maps to user ID 1000 in the host". In addition the privileges associated with capabilities are somewhat reduced so that they cannot be used to escape from the container environment. A full description of user namespaces is outside the scope of this document, however LWN has a good write-up on the topic. From the libvirt point of view, the key thing to remember is that defining an ID mapping for users and groups in the container XML configuration causes libvirt to activate the user namespace feature.

Resources