Kubernetes Docker OS parameters vs Host OS parameters - linux

I am running NGINX and Tomcat on Docker containers (container OS is Red Hat linux) and deployed through Kubernetes pods. Host OS is Red Hat Linux.
My query is which OS parameter will be effective - host OS or container OS? During performance tuning do I need to tune both OS or host OS parameters are effective.
Example of some parameters I am referring to are ulimit - n (open files), net.ipv4.tcp.* , fs.file-max, etc.

As Crazykev already mentioned, you can set ulimits using the respective docker run flags.
Parameters like net.ipv4.tcp.* are kernel parameters. Docker containers are run in the same Linux kernel as the host system; for this reason, parameters set on the host will also be effective in the container.
Usually, you will not be able to set these parameters from inside a container. You can (not saying you should) start a container with the --privileged flag, which might (untested) give you access to setting kernel parameters from within the container. The Kubernetes docs also describe how to start privileged containers.

In Docker container, and I'm not sure if it could be called as OS...
By the way, some of your referring example may not support set directly in docker container for safety or other issues. You may need to find more manual in docker docs.(for example, ulimit, docker run --ulimit nofile=262144:262144)

Kubernetes right now does not support adding ulimit there is an issue in kubernetes open for that.
Similar question which asks about setting ulimit is answered here.

Related

Obtaining Linux host metrics from within a docker container

Looking for some recommendations for how to report linux host metrics such as cpu and memory utilization and disk usage stats from within a docker container. The host will contain a number of docker containers. One thought was to run Top and other basic linux commands from the outside the container and push them into a container folder that has the appropriate authorization so that they can be consumed. Another thought was to use the docker api to run docker stats for the containers but not sure this is the best as it may not report on other processes running on the host that are not containerized. A third option would be to somehow execute something like TOP and other commands on the host from within the container, this option being the most ideal for my situation. I was just looking for some proven design patterns that others have used. Also, I don’t have the ability to install a bunch of tools on the host as this would be a customer host which I don’t have control as to what is already installed.
You may run your container in privileged mode, but be aware that it this could compromise the host security as your container will no longer be in a sandboxed environment.
docker run -d --privileged --pid=host alpine:3.8 sh
When the operator executes docker run --privileged, Docker will enable access to all devices on the host as well as set some configuration in AppArmor or SELinux to allow the container nearly all the same access to the host as processes running outside containers on the host. Additional information about running with --privileged is available on the Docker Blog.
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
Good reference: https://security.stackexchange.com/a/218379

How can I change memory dedicated to Docker in Linux? [duplicate]

This question already has answers here:
How to increase/check default memory Docker has on Linux?
(2 answers)
Closed 1 year ago.
I'm trying to run the confluent cp-demo docker image.
https://docs.confluent.io/5.5.0/tutorials/cp-demo/docs/index.html
I'm using Ubuntu 20-04 and in order to start the container I need to increase docker max memory setting from the default 2GB to 8GB.
This can be done easily on Windows and Mac by the Docker Desktop app but that isn't available on Ubuntu and I haven't found a way to modify it using the cli. (I can only modify the memory of a container after I started it with the cli,but in order to start cp-demo it says I need to change the memory setting first).
Does anyone know how can I do this?
As far I know, in mac and windows you can control memory and cpu for docker application but in Linux it directly usages linux namespaces and cgroups so you don't need to do any changes for daemon, docker will provide free resources available on host to containers.
For your requirements:.
Make sure host-machine have sufficient free resources available.
Start cp-demo with memory limit:. docker run -it --memory=”2g” cp-demo

Understanding the technical (Docker) container architecture

I am new to containers and would like to get a good knowledge about how container technology (Docker) is made up from 'scratch'. I have to write a paper and hope that I have every important thing correctly understood so far.
The following diagram is made by me and shows my current understanding of containers.
Obviously we need an OS with a Kernel that allows us to use the hardware. For Docker this is Linux. Docker for Windows uses a VM with Linux for that.
On top of our Linux OS we then run our Docker Engine. Our Docker Engine is in charge of starting, building, configuring ... our images and containers. But most importantly the Docker Engine handles everything that has to do with isolating of containers, for example it maintains how namespaces or cgroups are used so that every container has it's own full filesystem.
Then we have our actual containers. Containers themselvesneed almost every time a kind of OS itself. This is mostly just a very compact one like Alpine or Busybox. They collect a small number of standard functions such as 'file', 'tar', 'grep' that most software definitely need. This compact OS is now using the Kernel from our full Linux OS. They don't have their own Kernel.
On top of the compact OS we then place our actual piece of software such as Node.js or a NGINX Server. This software is only using the compact OS which in return uses the Kernel from our full Linux OS. And all data or modifications that is generated or done in runtime are made on the writeable layer of our container.
And if I understood correctly, our container or everything that runs in our container is not using or interacting with our full Linux OS but just with it's Kernel?
I also don't quite understand how the writeable layer in a container works. Like how does my software for example know that a modified file from a read-only layer is now present in the writeable layer and should use this?
I would really appreciate some corrections or suggestions on what I have missed out so far. Thank you
And if I understood correctly, our container or everything that runs in our container is not using or interacting with our full Linux OS but just with it's Kernel?
The containers are just processes. For kernel, Docker daemon, NodeJS application and Nginx are processes. That's why containers don't have their own kernels. The difference between Docker daemon process (and other processes on a host) and processes that are running within containers is in their scope (it's called a namespace). Processes in containers are run in isolation and they don't see anything around their namespace. There are many different namespaces, for example, a pid namespace is one of them and it limits the visibility of other processes. That's why ps command in a container doesn't show processes from a host or other containers. Namespaces is a kernel things and they are more about what a process can see and access to while there is also cgroups that apply limits for CPU and memory usage.
I hope this helps you somehow, at least, I tried to put more attention to the kernel because Docker is just a daemon that spins new processes with configured namespaces, cgroups and own filesystem.
Here are some links that might be useful:
What even is a container: namespaces and cgroups
How containers work: overlayfs
See also other posts about containers on https://jvns.ca (I recommend it because Julia explains it in simple words and even provide illustrations.)
If you want to go deeper, I'd suggest to look at Namespaces: from chroot() to containers slides and read the article about creation of own containers.

Is there a way to restrict untrusted container scheduler?

I have an application which I'd like to give the privilege to launch short-lived tasks and schedule these as docker containers. I was thinking of doing this simply via docker run.
As I want to make the attack surface as small as possible, I treat the application as untrusted. As such it can potentially run arbitrary docker run commands (if the codebase contained bug or the container was compromised, input was improperly escaped somewhere etc.) against a predefined docker API endpoint.
This is why I'd like to restrict that application (effectively a scheduler) in some ways:
prevent --privileged use
enforce --read-only flag
enforce memory & CPU limits
I looked at couple of options:
selinux
the selinux policies would need to be set on the host level and then propagated inside the containers via --selinux-enabled flag on the daemon level. The scheduler can however override this anyway via run --privileged.
seccomp profiles
these are only applied at a time of launching the container (seccomp flags are available for docker run)
AppArmor
this can (again) be overriden on the scheduler level via --privileged
docker daemon --exec-opts flag
only a single option is actually available for this flag (native.cgroupdriver)
It seems that Docker is designed to trust container schedulers by default.
Does anyone know if this is a design decision?
Is there any other possible solution available w/ current latest Docker version that I missed?
I also looked at Kubernetes and its Limit Ranges & Resource Quotas which can be applied to K8S namespaces, which looked interesting, assuming there's a way to enforce certain schedulers to only use certain namespaces. This would however increase the scope of this problem to operating K8S cluster.
running docker on a unix platform should be compatible with nice Or so I would think at first looking a little more closely it looks like you need somethign like -cpuset-cpus="0,1"
From the second link , "The --cpu-quota looks to be similar to the --cpuset-cpus ... allocate one or a few cores to a process, it's just time managed instead of processor number managed."

Manage LXC containers from host in puppet

Is there any puppet modules for manage nodes in LXC containers from local host? E.g.: I have host with 50 LXC containers and I want to manage all those containers from host directly, not from another one LXC container with puppetmaster.
You could use an existing Puppet module, which doesn't seem to be well documented in terms of usage nor features.
You'll be probably more lucky if you're using Docker as a wrapper around plain LXC.
The Docker Puppet module seems to have meaningful documentation + Docker itself could help you to manage these containers a bit more effectively in general.

Resources