I am aware that I can limit the resources allocated to a container while provisioning using docker with the -c and -m flags for CPU and memory.
However, is there a way I can change these allocated resources to containers dynamically (after they have been provisioned) and without redeploying the same container with changed resources?
At the time (Docker v1.11.1) has the command docker update (view docs). With this you can change allocated resources on the fly.
Usage: docker update [OPTIONS] CONTAINER [CONTAINER...]
Update configuration of one or more containers
--blkio-weight Block IO (relative weight), between 10 and 1000
--cpu-shares CPU shares (relative weight)
--cpu-period Limit CPU CFS (Completely Fair Scheduler) period
--cpu-quota Limit CPU CFS (Completely Fair Scheduler) quota
--cpuset-cpus CPUs in which to allow execution (0-3, 0,1)
--cpuset-mems MEMs in which to allow execution (0-3, 0,1)
--help Print usage
--kernel-memory Kernel memory limit
-m, --memory Memory limit
--memory-reservation Memory soft limit
--memory-swap Swap limit equal to memory plus swap: '-1' to enable unlimited swap
--restart Restart policy to apply when a container exits
not at present no - There is a desire to see someone implement it though: https://github.com/docker/docker/issues/6323
That could be coming for docker 1.10 or 1.11 (Q1 2016): PR 15078 is implementing (Dec. 2015) support for changing resources (including CPU) both for stopped and running container.
Update 2016: it is part of docker 1.10 and documented in docker update (PR 15078).
We decided to allow to set what we called resources, which consists of cgroup thingies for now, hence the following PR #18073.
The only allowed mutable elements of a container are in HostConfig and precisely in Resources (see the struct).
resources := runconfig.Resources{
BlkioWeight: *flBlkioWeight,
CpusetCpus: *flCpusetCpus, <====
CpusetMems: *flCpusetMems, <====
CPUShares: *flCPUShares, <====
Memory: flMemory,
MemoryReservation: memoryReservation,
MemorySwap: memorySwap,
KernelMemory: kernelMemory,
CPUPeriod: *flCPUPeriod,
CPUQuota: *flCPUQuota,
}
The command should be set (in the end: update).
The allowed changes are passed as flags : e.g. --memory=1Gb --cpushare=… (as this PR does).
There is one flag for each attribute of the Resources struct (and no more, no less).
Note that making changes via docker set should persist.
I.e., those changes would be permanent (updated in the container's JSON)
Related
Good day , I know that Docker containers are using the host's kernel (which is why containers are considered as lightweight vms) Here the the source . However, after reading Runtime Options part of a docker documentation I met an option called --kernel-memory. The doc says
The maximum amount of kernel memory the container can use.
I didn't understand what it does. My guess is every container will allocate some memory in host's kernel space .If so then what is the reason , isn't it vulnerable for a user process to allocate memory in kernel space ?
The whole CPU/Memory Limitation stuff is using cgroups.
You can find all settings performed by docker run (either per args or per default) under /sys/fs/cgroup/memory/docker/<container ID> for memory or /sys/fs/cgroup/cpu/docker/<container ID> for cpu.
So the --kernel-memory:
Reading: cat memory.kmem.limit_in_bytes
Writing: sudo -s echo 2167483648 > memory.kmem.limit_in_bytes
And also the benchmarking memory.kmem.max_usage_in_bytes and memory.kmem.usage_in_bytes which shows (rather selfexplaining) the current usage and the highest usage overall.
CGroup docs about Kernel Memory
For the functionality I will recommend reading Kernel Docs for CGroups V1 instead of the docker docs:
2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
With the Kernel memory extension, the Memory Controller is able to
limit the amount of kernel memory used by the system. Kernel memory is
fundamentally different than user memory, since it can't be swapped
out, which makes it possible to DoS the system by consuming too much
of this precious resource.
[..]
The memory used is
accumulated into memory.kmem.usage_in_bytes, or in a separate counter
when it makes sense. (currently only for tcp). The main "kmem" counter
is fed into the main counter, so kmem charges will also be visible
from the user counter.
Currently no soft limit is implemented for kernel memory. It is future
work to trigger slab reclaim when those limits are reached.
and
2.7.2 Common use cases
Because the "kmem" counter is fed to the main user counter, kernel
memory can never be limited completely independently of user memory.
Say "U" is the user limit, and "K" the kernel limit. There are three
possible ways limits can be set:
U != 0, K = unlimited:
This is the standard memcg limitation mechanism already present before kmem
accounting. Kernel memory is completely ignored.
U != 0, K < U:
Kernel memory is a subset of the user memory. This setup is useful in
deployments where the total amount of memory per-cgroup is overcommited.
Overcommiting kernel memory limits is definitely not recommended, since the
box can still run out of non-reclaimable memory.
In this case, the admin could set up K so that the sum of all groups is
never greater than the total memory, and freely set U at the cost of his
QoS.
WARNING: In the current implementation, memory reclaim will NOT be
triggered for a cgroup when it hits K while staying below U, which makes
this setup impractical.
U != 0, K >= U:
Since kmem charges will also be fed to the user counter and reclaim will be
triggered for the cgroup for both kinds of memory. This setup gives the
admin a unified view of memory, and it is also useful for people who just
want to track kernel memory usage.
Clumsy Attempt of a Conclusion
Given a running container with --memory="2g" --memory-swap="2g" --oom-kill-disable using
cat memory.kmem.max_usage_in_bytes
10747904
10 MB of Kernel-Memory in normal state. Would make sense to me to limit it, let's say to 20 MB of Kernel-Memory. Then it should kill or limit the container to protect the host. But due to the fact that there is - according to the docs - no possibility to reclaim the memory and the OOM Killer is starting to kill processes on host then even with a plenty of free memory (according to this: https://github.com/docker/for-linux/issues/1001) for me it is rather unpractical to use that.
The quoted option to set it >= memory.limit_in_bytes is not really helpful in that scenario either.
Deprecated
--kernel-memory is deprecated in v20.10, due to the fact someone (=Linux Kernel) realized all that as well..
What we can do then?
ULimit
Docker API exposes HostConfig|Ulimit which writes to /etc/security/limits.conf. For docker run should be --ulimit <type>=<soft>:<hard>. Use cat /etc/security/limits.conf or man setrlimit to see the categories and you can try to protect your system from filling kernel memory by e.g. generate unlimited processes with --ulimit nproc=500:500, but be careful, nproc works for users and not for containers, so count together..
To prevent DDoS (intentionally or unintentionally) i would suggest to limit at least nofile and nproc. Maybe someone can elaborate further..
sysctl:
docker run --sysctl can change kernel variables on message queue and shared memory, also network, e.g. docker run --sysctl net.ipv4.tcp_max_orphans= for orphan tcp connections which defaults on my system to 131072, and by a kernel memory usage of 64 kB each: Bang 8 GB on malfunction or dos. Maybe someone can elaborate further..
I'm a little confused with what I'm seeing with a node process that I have running. docker stats on the host is showing that the container is using over 100% CPU. This makes me think that the node process is maxing out the CPU. This is confirmed when I run top on the host and see that the node process is using over 100% CPU.
When I jump into the docker container I see that node is only using 54% of the CPU and that the processing is split between the two cores. I was expecting to see one core maxed out and the other at 0 since Node is single-threaded.
I found this QA and it looks like the OS could be moving the process between the cores (news to me). Is This Single Node.JS App Using Multiple Cores?
Can you help me interpret the results? Is node pretty much maxed out? Or since the process in the container is showing 54% usage can that go up to 100%? Why is the top in the node container showing 54% usage for node but 45% + 46% for both cores. Nothing is running in the container but the single node process. I'm not using clustering although maybe a package I have included is.
I'm asking all this as I'm trying to understand if I should be scaling this ECS instance out or if node can handle more.
Node.JS: 15.1.0
EC2 Instance: c5.large
NestJS: 7.3.1
Different tops
What you're seeing is (likely) due to a difference in flavors of top.
I'm going to take a wild guess and say that your Docker image is perhaps based on Alpine? The top command in Alpine is busybox. It reports the per-process CPU usage as a percentage of the TOTAL number of CPUs available (nCPUs * 100%).
This differs from most other flavors of top, which report the per-process CPU usage as a percentage of a SINGLE CPU.
Both tops show same thing: ~50% usage on each CPU
The two top screenshots are actually showing the same thing: node process is using about 50% of each of the 2 CPUs.
Testing theory
We can test this with the following:
# This will max out 1 cpu of the system
docker run --name stress --rm -d alpine sh -c 'apk add stress-ng && stress-ng --cpu 1'
# This shows the busybox top with usage as ratio of total CPUs
# press 'c' in top to see the per-CPU info at the top
docker exec -it stress top
# This will install and run procps top, with usage as a ratio of single CPU
docker exec -it stress sh -c 'apk add procps && /usr/bin/top'
In the screenshot above, we can see two different flavors of top. They are reporting the same CPU usage, but the upper one reports this as "100% CPU" (as a percentage of a single core), while lower one reports this as 6% (1/16 cores = 6.25%).
What does this tell us about node's CPU usage?
Node is single-threaded, and cannot use more than 100% of a CPU. ...sort of. Under the hood, Node uses libuv, which does run threads in silos. This is how Node receives asynchronous events for IO operations, for example. These threads do use CPU and can push your CPU usage over 100%. Some packages are also written as add-ons to Node, and these also use threads.
The environment variable UV_THREADPOOL_SIZE limits the maximum number of libuv-controlled threads which may run simultaneously. Setting this to a larger number (default is 4) before running node may remove a bottleneck.
If you are doing some CPU-intensive operations, consider using cluster, Worker Threads, writing your own add-on or spawning separate processes to do the computation.
I want to test Pod eviction events that caused by memorypressure for taintbasedeviction on my pods, for to do that I created a memory load on my instance that have 2 vcpu and 8GB Ram.
For create a load I have run this command :
stress-ng --vm 2 --vm-bytes 10G --timeout 60s
Output of memory usage
$ free -h
total used free shared buff/cache available
Mem: 7.8Gi 2.7Gi 1.0Gi 3.9Gi 4.1Gi 984Mi
Swap: 0B 0B 0B
But in my nodes states there is no memorypressure I have updated kubelet eviction parameters at below :
evictionHard:
memory.available: "200Mi"
As summary, How Can I create memory pressure on my worker nodes for test the taint based eviction ?
Thanks
You could invoke the stress command multiple times. Check the script here.
The value for memory.available is derived from the cgroupfs instead of tools like free -m. This is important because free -m does not work in a container, and if users use the node allocatable feature, out of resource decisions are made local to the end user Pod part of the cgroup hierarchy as well as the root node. This script reproduces the same set of steps that the kubelet performs to calculate memory.available. The kubelet excludes inactive_file (i.e. # of bytes of file-backed memory on inactive LRU list) from its calculation as it assumes that memory is reclaimable under pressure.
I know you can do something like
docker build -c 2 .
to give the container 2 cores, but can you do something like give the container 50% of the memory and 50% of the CPU?
Your example docker build -c 2 ., doesn't actually do what you think it does. The -c flag assigns cpu-shares, which is a relative weighting with a default of 1024. So if another container is running with the default weighting and CPU usage is maxed out, your build container will only get 2/1026 of the CPU. If you want to use this mechanism to allocate CPU, you will need to do some maths based on the number of running containers and their existing weightings (e.g if there are two containers running with the default weighting, and you give a 3rd container a weighting of 2048, it will get 2048/(2048+1024+1024) or 50% of the CPU).
You can also use the --cpuset-cpus argument to control which cores the container runs on, which I think is what you're thinking of, but that will only help you if set it for all containers.
I think what you're actually after is the --cpu-quota setting which will use the Completely Fair Scheduler in the linux kernel. The period should be set to 100000 (100m)s by default, meaning the argument --cpu-quota=50000 should give the container 50% of 1 CPU.
Regarding memory, you can only set a maximum usage for each container, you can't allocate a percentage slice.
For full details on all of this, see https://docs.docker.com/reference/run/#runtime-constraints-on-resources
Through some longevity testing with docker (docker 1.5 and 1.6 with no memory limit) on (centos 7 / rhel 7) and observing the systemd-cgtop stats for the running containers, I noticed what appeared to be very high memory use. Typically the particular application running in a non-containerized state only utilizes around 200-300Meg of memory. Over a 3 day period I ended up seeing systemd-cgtop reporting that my container was up to 13G of memory used. While I am not an expert Linux admin by any means, I started digging in to this which pointed me to the following articles:
https://unix.stackexchange.com/questions/34795/correctly-determining-memory-usage-in-linux
http://corlewsolutions.com/articles/article-6-understanding-the-free-command-in-ubuntu-and-linux
So basically what I am understanding is to determine the actual free memory within the system unit would be to look at the -/+ buffers/cache: within "free -m" and not the top line, as I also noticed that the top line within "free -m" would constantly increase with memory used and constantly show a decreased amount of free memory just like what I am observing with my container through systemd-cgtop. If I observe the -/+ buffers/cache: line I will see the actual stable amounts of memory being used / free. Also, if I observe the actual process within top on the host, I can see the process itself is only ever using less then 1% of memory (0.8% of 32G).
I am a bit confused as to whats going on here. If I set a memory limit of 500-1000M for a container (I believe it would turn out to be twice as much due to the swap) would my process eventually stop when I reach my memory limit, even though the process itself is not using anywhere near that much memory? If anybody out there has any feedback on the former, that would be great. Thanks!
I used docker in CentOS 7 for a while, and got the same confused by these. Checking the github issues link, it looks like docker stats in this release is kind of mislead.
https://github.com/docker/docker/issues/10824
so I just ignored memory usage getting from docker stats.
A year since you asked, but adding an answer here for anyone else interested. If you set a memory limit, I think it would not be killed unless it fails to reclaim unused memory. the cgroups metrics and consequently docker stats shows page cache+RES. You could look at the cgroups detailed metrics to see the breakup
I had a similar issue and when I tested with a memory limit, I saw that the container is not killed. Rather the memory is reclaimed and reused.