Using Ansible to implement certs rotation functionality in Kubernetes Cluster - linux

How to use Ansible for certs rotation on different layers in kubernetes cluster?
Before we used fleet and now migrating to kubernetes.

If I hear your situation correctly, then I think you will be happiest with a DaemonSet that installs (and optionally monitors) ansible-pull.service and ansible-pull.timer on the Nodes.
The DaemonSet ensures the container is scheduled on every Node (unlike a CronJob or such), and with /etc/systemd/system volume mounted into the container plus go-systemd's ability to daemon-reload (along with the dbus socket, of course), the container can write out a suitably descriptive .service and .timer file for that Node.
Then ansible-pull will run as before, taking whatever steps your existing ansible playbooks did.
There are many approaches to how to achieve this similar action on non-Node machines, so I'll leave that as an exercise to the reader.
I don't know what you define as "Infrastructure" layer, but rotating the Kubernetes certs is relatively straightforward from ansible-pull's perspective: write out the new worker.pem and worker.key in /etc/kubernetes/ssl, bounce kubelet.service (or its hyperkube equivalent), voilà. Upper platform services I would expect are managed by the (ReplicaSet|Deployment|ReplicationController|etc) which owns them, meaning one can be a lot more declarative for in-cluster resources, having access to the full power of ConfigMap, Secret, Service, etc.

Related

Kubernetes cluster Nodes not creating automatically when other lost in Kubespray

I have successfully deployed a multi master Kubernetes cluster using the repo https://github.com/kubernetes-sigs/kubespray and everything works fine. But when I stop/terminate a node in the cluster, new node is not joining to the cluster.I had deployed kubernetes using KOPS, but the nodes were created automatically, when one deletes. Is this the expected behaviour in kubespray? Please help..
It is expected behavior because kubespray doesn't create any ASGs, which are AWS-specific resources. One will observe that kubespray only deals with existing machines; they do offer some terraform toys in their repo for provisioning machines, but kubespray itself does not get into that business.
You have a few options available to you:
Post-provision using scale.yml
Provision the new Node using your favorite mechanism
Create an inventory file containing it, and the etcd machines (presumably so kubespray can issue etcd certificates for the new Node
Invoke the scale.yml playbook
You may enjoy AWX in support of that.
Using plain kubeadm join
This is the mechanism I use for my clusters, FWIW
Create a kubeadm join token using kubeadm token create --ttl 0 (or whatever TTL you feel comfortable using)
You'll only need to do this once, or perhaps once per ASG, depending on your security tolerances
Use the cloud-init mechanism to ensure that docker, kubeadm, and kubelet binaries are present on the machine
You are welcome to use an AMI for doing that, too, if you enjoy building AMIs
Then invoke kubeadm join as described here: https://kubernetes.io/docs/setup/independent/high-availability/#install-workers
Use a Machine Controller
There are plenty of "machine controller" components that aim to use custom controllers inside Kubernetes to manage your node pools declaratively. I don't have experience with them, but I believe they do work. That link was just the first one that came to mind, but there are others, too
Our friends over at Kubedex have an entire page devoted to this question

Using runAsNonRoot in Kubernetes

We’ve been planning for a long time to introduce securityContext: runAsNonRoot: true as a requirement to our pod configurations for a while now.
Testing this today I’ve learnt that since v1.8.4 (I think) you also have to specify a particular UID for the user running the container, e.g runAsUser: 333.
This means we not only have to tell developers to ensure their containers don’t run as root, but also specify a specific UID that they should run as, which makes this significantly more problematic for us to introduce.
Have I understood this correctly? What are others doing in this area? To leverage runAsNonRoot is it now required that Docker containers run with a specific and known UID?
The Kubernetes Pod SecurityContext provides two options runAsNonRoot and runAsUser to enforce non root users. You can use both options separate from each other because they test for different configurations.
When you set runAsNonRoot: true you require that the container will run with a user with any UID other than 0. No matter which UID your user has.
When you set runAsUser: 333 you require that the container will run with a user with UID 333.
What are others doing in this area?
We are using runAsUser in situations where we don't want root to be used. Granted, those situations are not that frequent as you might think, since philosophy of deployment of 'processes' as separated pod's containers inside kubernetes cluster architecture differs from traditional compound monolithic deployment on single host where security implications of breach are quite different...
Most of our local development is done either on minicube or docker edge with k8s manifests so setup is as close as possible to our deployment cluster (apart from obvious limits). With that said, we don't have issues with user id allocation since initialization of persistent volume is not done externally so all file user/group ownership is done within pods with proper file permissions. On very rare occasions that docker is used for development, developer is instructed to set appropriate permissions manually across mounted volumes but that rare happens.

Running command on EC2 launch and shutdown in auto-scaling group

I'm running a Docker swarm deployed on AWS. The setup is an auto-scaling group of EC2 instances that each act as Docker swarm nodes.
When the auto-scaling group scales out (spawns new instance) I'd like to run a command on the instance to join the Docker swarm (i.e. docker swarm join ...) and when it scales in (shuts down instances) to leave the swarm (docker swarm leave).
I know I can do the first one with user data in the launch configuration, but I'm not sure how to act on shutdown. I'd like to make use of lifecycle hooks, and the docs mention I can run custom actions on launch/terminate, but it is never explained just how to do this. It should be possible to do without sending SQS/SNS/Cloudwatch events, right?
My AMI is a custom one based off of Ubuntu 16.04.
Thanks.
One of the core issues is that removing a node from a Swarm is currently a 2 or 3-step action when done gracefully, and some of those actions can't be done on the node that's leaving:
docker node demote, if leaving-node is a manager
docker swarm leave on leaving-node
docker swarm rm on a manager
This step 3 is what's tricky because it requires you to do one of three things to complete the removal process:
Put something on a worker that would let it do things on a manager remotely (ssh to a manager with sudo perms, or docker manager API access). Not a good idea. This breaks the security model of "workers can't do manager things" and greatly increases risk, so not recommended. We want our managers to stay secure, and our workers to have no control or visibility into the swarm.
(best if possible) Setup an external solution so that on a EC2 node removal, a job is run to SSH or API into a manager and remove the node from swarm. I've seen people do this, but can't remember a link/repo for full details on using a lambda, etc. to deal with the lifecycle hook.
Setup a simple cron on a single manager (or preferably as a manager-only service running a cron container) that removes workers that are marked down. This is a sort of blunt approach and has edge cases where you could potentially delete a node that's existing but considered down/unhealthy by swarm, but I've not heard of that happening. If it was fancy, it could maybe validate with AWS that node is indeed gone before removing.
WORST CASE, if a node goes down hard and doesn't do any of the above, it's not horrible, just not ideal for graceful management of user/db connections. After 30s a node is considered down and Service tasks will be re-created on healthy nodes. A long list of workers marked down in the swarm node list doesn't have an effect on your Services really, it's just unsightly (as long as there are enough healthy workers).
THERE'S A FEATURE REQUEST in GitHub to make this removal easier. I've commented on what I'm seeing in the wild. Feel free to post your story and use case in the SwarmKit repo.

Using multiple docker containers on the same host securely like isolated instances

I know, multiple Docker containers can be used in the same host, but can they be used securely like isolated instances? I want to run multiple secure and sandboxed containers such that no container can affect or access others.
For instance, can I serve nginx and apache containers which listen to different ports, with full trust that each container can only access their own files, resources etc?
In some sense you are asking the million dollar question with containers, and to be clear, IMHO there is no black and white answer to the question "is the platform/technology secure enough." It is a big (and important) enough question that the list of startups--not to mention amount of funding they've received--around container security is an appreciable number!
As noted in another answer, isolation for containers is realized through an assortment of Linux kernel capabilities (namespaces and cgroups), and adding more security to these capabilities is yet another set of technologies like seccomp, apparmor (or SELinux), user namespaces, or general hardening of the container runtime & node it is installed on (e.g. via the CIS benchmark guidelines). Out of the box default installation and default runtime parameters are probably not good enough for generically trusting in the kernel isolation primitives of Linux. However, this depends greatly on the trust level of what you are running across your container workloads. For example, is this all in-house within one organization? Can workloads be submitted from external sources? Obviously the spectrum of possibilities may greatly impact your level of trust.
If your use case is potentially narrow (for example, you mention web serving content from nginx or apache), and you are willing to do some work to handle base image creation, minimization and hardening; add to that a --readonly root filesystem and a capability limiting apparmor and seccomp profile, bind mount in the content served + writeable area, with no executables and ownership by an unprivileged user--all those things together might be enough for a specific use case.
However, there is no guarantee that a currently unknown security escape becomes a "0day" for Linux containers in the future, and that has led to promotion of lightweight virtualization that marries container isolation with actual hardware-level virtualization through shims from hyper.sh or Intel Clear Containers, as two examples. This is a happy medium between running a full virtualized OS with another container runtime and trusting kernel isolation with a single daemon on a single node. There is still a performance cost and memory overhead to adding this layer of isolation, but it is much less than a fully virtualized OS and work continues to make this less of a performance impact.
For a deeper set of information on all the "knobs" available for tuning container security, a presentation I gave last year several times is available on slideshare as well as via video from Skillsmatter.
The incredibly thorough "Understanding and Hardening Linux Containers" by Aaron Grattafiori is also a great resource with exhaustive detail on many of the same topics.
filesystem isolation (as well as memory and processes isolation) is a core feature of docker containers, based on the Linux Kernel abilities.
But if you wanted to be completely sure, you would deploy your containers on different nodes (each managed by their own docker daemons), each node being a VM (Virtual Machine) on your host, ensuring a complete sandbox.
Then a docker swarm or Kubernetes would be able to orchestrate those node and their containers, and make them communicate.
This is normally not needed when you have just a few linked containers: their should be able to be managed in isolation by a single docker daemon. You could use user namespace for additional isolation.
Plus, using nodes to separate containers implies different machines or different VM within the same machine.
And one big difference with a VM and a container is that a VM will preempt resources (allocate a fix minimal amount of disk/memory/CPU), which means you cannot launch an hundred VM, one per container. As opposed to a single docker instance, where a container, if it does nothing, won't consume much disk space/memory/CPU at all.

Is there a way to restrict untrusted container scheduler?

I have an application which I'd like to give the privilege to launch short-lived tasks and schedule these as docker containers. I was thinking of doing this simply via docker run.
As I want to make the attack surface as small as possible, I treat the application as untrusted. As such it can potentially run arbitrary docker run commands (if the codebase contained bug or the container was compromised, input was improperly escaped somewhere etc.) against a predefined docker API endpoint.
This is why I'd like to restrict that application (effectively a scheduler) in some ways:
prevent --privileged use
enforce --read-only flag
enforce memory & CPU limits
I looked at couple of options:
selinux
the selinux policies would need to be set on the host level and then propagated inside the containers via --selinux-enabled flag on the daemon level. The scheduler can however override this anyway via run --privileged.
seccomp profiles
these are only applied at a time of launching the container (seccomp flags are available for docker run)
AppArmor
this can (again) be overriden on the scheduler level via --privileged
docker daemon --exec-opts flag
only a single option is actually available for this flag (native.cgroupdriver)
It seems that Docker is designed to trust container schedulers by default.
Does anyone know if this is a design decision?
Is there any other possible solution available w/ current latest Docker version that I missed?
I also looked at Kubernetes and its Limit Ranges & Resource Quotas which can be applied to K8S namespaces, which looked interesting, assuming there's a way to enforce certain schedulers to only use certain namespaces. This would however increase the scope of this problem to operating K8S cluster.
running docker on a unix platform should be compatible with nice Or so I would think at first looking a little more closely it looks like you need somethign like -cpuset-cpus="0,1"
From the second link , "The --cpu-quota looks to be similar to the --cpuset-cpus ... allocate one or a few cores to a process, it's just time managed instead of processor number managed."

Resources