How to limit allocatable memory per node on Kubernetes? - azure

I would like to limit allocatable memory per node (VM) on Kubernetes.
Now it seems that certain pods can grow over the memory limit of VM making it unresponsive instead of killing pods before that happens.

See Reserve Compute Resources for System Daemons.
In the systemd, we can configure kubelet with Node Allocatable feature like this.
$ cat > /etc/systemd/system/kubelet.service.d/20-node-eviction.conf <<EOF
Environment="KUBELET_EXTRA_ARGS=--eviction-hard=memory.available<500Mi --system-reserved=memory=1Gi"
EOF
$ systemctl daemon-reload

Related

Docker nproc limit has to be set, seemingly, too high in order for a container to run

I'm trying to debug a weird behavior of the image I don't own - GitHub repo with the image.
Running
docker run -it --ulimit nproc=100 --ulimit nofile=90:100 --network none --tmpfs /tmp:rw,noexec,nosuid,size=65536k --tmpfs /home/glot:rw,exec,nosuid,size=131072k --user=glot --read-only glot/python:latest /bin/bash
results in exec /bin/bash: resource temporarily unavailable.
However if we bump nproc to 10000 it suddenly starts working (for me even bumping it to 1000 results in the same error).
This image has no ps but from what I see in the proc folder, there are never more than 2 processes.
I'm not experienced with Linux and container limits, so any insights and comments are welcome.
P.S.
A bit of background: This image serves as a sandbox for executing fleeting snippets of code, and nproc limit alleviates the fork bombing problem.
from https://docs.docker.com/engine/reference/commandline/run/
For nproc usage
Be careful setting nproc with the ulimit flag as nproc is designed by Linux to set the maximum number of processes available to a user, not to a container. For example, start four containers with daemon user:
docker run -d -u daemon --ulimit nproc=3 busybox top
docker run -d -u daemon --ulimit nproc=3 busybox top
docker run -d -u daemon --ulimit nproc=3 busybox top
docker run -d -u daemon --ulimit nproc=3 busybox top
The 4th container fails and reports “[8] System error: resource temporarily unavailable” error. This fails because the caller set nproc=3 resulting in the first three containers using up the three processes quota set for the daemon user.
As the comment from #Philippe says - ulimit metrics are read per user system-wide.
The problem was that the user created for the image shared the same UID as the main user on the host, although with different username. When the limits were enforced for nproc in container the total number of processes for this UID was taken into the account (including all the processes from the local host user). And since this was ran on the desktop env with many running processes it is no surprise it broke the 100 hard limit (even 1000) on the number of processes.
Be careful with ulimits and UIDs, they are not encapsulated per container but rather shared system wide. And a user with different username but the same UID between a container and the host is treated as the same user when enforcing ulimits inside a container.

Upgrading leaf machine in Memsql

My leaf are currently running on ec2 30 GB ram machines, can I upgrade the same machines to 60 GB ram machines and ensure that memsql leaf memory increases accordingly.
Yes, you certainly can.
If you are adding more memory to the same machines, you just need to
Stop memsql: memsql-ops memsql-stop
Provision the new RAM on the machine
Start memsql: memsql-ops memsql-start
Configure the new memory limit: memsql-ops memsql-update-config --set-global --key maximum_memory --value value_in_mb - see https://help.memsql.com/hc/en-us/articles/115002247706-How-do-I-change-MemSQL-s-memory-limits-after-changing-system-memory-capacity-
If you are switching to new machines instead of provisioning more memory on the same machines, then you can:
Deploy the new machines, install MemSQL on them, and add them to your cluster: https://docs.memsql.com/quickstarts/v5.8/quick-start-on-premises/#5-add-more-host-machines-and-memsql-nodes
Run memsql-ops cluster-manual-control --enable
Run REMOVE LEAF 'host':port for all the old machines that you now want to remove. This will move the data to the new nodes.
Run memsql-ops memsql-delete on each of the old leaf nodes that you just ran REMOVE LEAF on. This will delete the nodes which are now empty of data after the last step.
Run memsql-ops cluster-manual-control --disable

How to evaluate the CPU and Mem usage for speccific command/docker in linux?

I'm composing yaml file for scripts running in docker and orchestrated by kubernetes. Is there a way to evaluate the resource utilization for a specicific command or docker, or what's the best practice to set the limit of cpu and mem for pods?
Edit
Most of these scripts will run in a short time, so it's hard to get the resource info. I just wanna try to find a tool to get the maximum usage of cpu and mem, the tool works like time, to print out the execution time.
You can view statistics for container(s) using the docker stats command.
For example;
docker stats containera containerb
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
containera 0.00% 24.15 MB / 1.041 GB 2.32% 1.8 MB / 79.37 kB 0 B / 81.92 kB
containerb 0.00% 24.95 MB / 1.041 GB 2.40% 1.798 MB / 80.72 kB 0 B / 81.92 kB
Or, see processes running in a container using docker top <container>
docker top containera
UID PID PPID C STIME TTY TIME CMD
root 4558 2850 0 21:13 ? 00:00:00 sh -c npm install http-server -g && mkdir -p /public && echo "welcome to containera" > /public/index.html && http-server -a 0.0.0.0 -p 4200
root 4647 4558 0 21:13 ? 00:00:00 node /usr/local/bin/http-server -a 0.0.0.0 -p 4200
Limiting resources
Docker compose (like docker itself) allows you to set limits on resources for a container, for example, limiting the maximum amount of memory used, cpu-shares, etc.
Read this section in the docker-compose yaml reference, and the docker run reference on "Runtime constraints on resources"
There are some good answers in the question: Peak memory usage of a linux/unix process
TL;DR: /usr/bin/time -v <command> or use valgrind.
That should help you get an idea for how much memory you need to assign as the limit for your app but CPU is a bit different beast. If your app is CPU bound then it will use all the CPU you give it no matter what limit you set. Also, in Kubernetes you assign cores (or millicores) to apps so it's not always terribly useful to know what % of the CPU was used on any particular machine as that won't readily translate to cores.
You should give your app as many CPU cores as you feel comfortable with and that allows your app to succeed in an acceptable amount of time. That will depend on cost and how many cores you have available in your cluster. It also depends a bit on the architecture of your app. For instance, if the application can't take advantage of multiple cores then there isn't much use in giving it more than 1.
In case you have any longer running apps, you could try installing Kubedash. If you have Heapster installed then Kubedash uses the built in metrics for Kubernetes to show you average and max CPU/Memory utilization. It helps a lot when trying to figure out what requests & limits to assign to a particular application.
Hope that helps!

How do I increase limits in my docker containers on CoreOS?

By default, CoreOS and the Linux kernel has some pretty conservative limits defined for things such as open files and locked memory.
So I upped the values in /etc/systemd/system.conf:
DefaultLimitNOFILE=500000
DefaultLimitMEMLOCK=unlimited
However, when I start my docker containers the limits are still low. ulimit -l prints 64.
Running ulimit -l unlimited prints an error.
ulimit: max locked memory: cannot modify limit: Operation not
permitted
So I placed
LimitMAXLOCKED=unlimited
LimitNOFILE=64000
In my systemd unit file.
However, these values are not coming through to the docker container and calling them still doesn't work. I've rebooted the machine after changing the systemwide defaults.
This is probably more of a systemd thing. How do I fix this?
Docker containers are not started nor managed by systemd. The Docker daemon is responsible for setting limits here. Here is an example taken from the official documentation:
# docker run --ulimit nofile=1024:1024 --rm debian sh -c "ulimit -n"

Docker + Cassandra ulimit error

I am trying to start a cassandra (not dsc) server on Docker (ubuntu 14.04). When I run service cassandra start (as root), I get
/etc/init.d/cassandra: 82: ulimit: error setting limit (Operation not permitted)
line 82 of that file is
ulimit -l unlimited
I'm not really sure what I need to change it to.
I would expect you would get that warning but that Cassandra would continue to start up and run correctly. As pointed out in the other answer, Docker restricts certain operations for safety reasons. In this case, the Casssandra init script is trying to allow unlimited locked memory. Assuming you are running with swap disabled (as it is a Cassandra best practice) then you can safely ignore this error.
I run Cassandra in Docker for my development environment and also get this warning, but Cassandra starts and runs just fine. If it is not starting up, check the cassandra log files for another problem.
A short intro into ulimit: RESOURCE LIMITS ON UNIX SYSTEMS (ULIMIT).
The command this init script is trying to issue is supposed to set the max locked memory limit to, well, unlimited. Should succeed for root. Does whoami print root?
UPD: further research led me to this Google Groups discussion. Hopefully it will clarify things a bit.
/etc/init.d/cassandra start/restart/status will not work because init system is not running inside the container so the available option is to restart the container
docker restart "container id or container name"

Resources