Docker not reporting memory usage correctly? - linux

Through some longevity testing with docker (docker 1.5 and 1.6 with no memory limit) on (centos 7 / rhel 7) and observing the systemd-cgtop stats for the running containers, I noticed what appeared to be very high memory use. Typically the particular application running in a non-containerized state only utilizes around 200-300Meg of memory. Over a 3 day period I ended up seeing systemd-cgtop reporting that my container was up to 13G of memory used. While I am not an expert Linux admin by any means, I started digging in to this which pointed me to the following articles:
https://unix.stackexchange.com/questions/34795/correctly-determining-memory-usage-in-linux
http://corlewsolutions.com/articles/article-6-understanding-the-free-command-in-ubuntu-and-linux
So basically what I am understanding is to determine the actual free memory within the system unit would be to look at the -/+ buffers/cache: within "free -m" and not the top line, as I also noticed that the top line within "free -m" would constantly increase with memory used and constantly show a decreased amount of free memory just like what I am observing with my container through systemd-cgtop. If I observe the -/+ buffers/cache: line I will see the actual stable amounts of memory being used / free. Also, if I observe the actual process within top on the host, I can see the process itself is only ever using less then 1% of memory (0.8% of 32G).
I am a bit confused as to whats going on here. If I set a memory limit of 500-1000M for a container (I believe it would turn out to be twice as much due to the swap) would my process eventually stop when I reach my memory limit, even though the process itself is not using anywhere near that much memory? If anybody out there has any feedback on the former, that would be great. Thanks!

I used docker in CentOS 7 for a while, and got the same confused by these. Checking the github issues link, it looks like docker stats in this release is kind of mislead.
https://github.com/docker/docker/issues/10824
so I just ignored memory usage getting from docker stats.

A year since you asked, but adding an answer here for anyone else interested. If you set a memory limit, I think it would not be killed unless it fails to reclaim unused memory. the cgroups metrics and consequently docker stats shows page cache+RES. You could look at the cgroups detailed metrics to see the breakup
I had a similar issue and when I tested with a memory limit, I saw that the container is not killed. Rather the memory is reclaimed and reused.

Related

What is the relation between `container_memory_working_set_bytes` metric and OOM-killer on the container?

I'm trying to find out and understand how OOM-killer works on the container.
To figuring it out, I've read lots of articles and I found out that OOM-killer kills container based on the oom_score. And oom_score is determined by oom_score_adj and memory usage of that process.
And there're two metrics container_memory_working_set_bytes, container_memory_rss from the cAdvisor for monitoring memory usage of the container.
It seems that RSS memory (container_memory_rss) has impact on oom_score so I can understand that with the container_memory_rss metric, if that metric reached to memory limit, the OOM-killer will kill the process.
https://github.com/torvalds/linux/blob/v3.10/fs/proc/base.c#L439
https://github.com/torvalds/linux/blob/v3.10/mm/oom_kill.c#L141
https://github.com/torvalds/linux/blob/v3.10/include/linux/mm.h#L1136
But from the articles like below:
https://faun.pub/how-much-is-too-much-the-linux-oomkiller-and-used-memory-d32186f29c9d
https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e66
The better metric is container_memory_working_set_bytes as this is what the OOM killer is watching for.
I cannot understand the fact that OOM-killer is watching for container's working set memory. I think I'm not understand the meaning of the working set memory on the container which is 'total usage - inactive file'.
https://github.com/google/cadvisor/issues/2582#issuecomment-644883028
Where can I find the reference? Or could you explain the relationship between working set memory and OOM-kill on the container?
As you already know, container_memory_working_set_bytes is:
the amount of working set memory and it includes recently accessed
memory, dirty memory, and kernel memory. Therefore, Working set is
(lesser than or equal to) </= "usage".
The container_memory_working_set_bytes is being used for OoM decisions because it excludes cached data (Linux Page Cache) that can be evicted in memory pressure scenarios.
So, if the container_memory_working_set_bytes is increased to the limit, it will lead to oomkill.
You can find the fact that when Linux kernel checking available memory, it calls vm_enough_memory() to find out how many pages are potentially available.
Then when the machine is low on memory, old page frames including cache will be reclaimed but kernel still may find that it was unable free enough pages to satisfy a request. Now it's time to call out_of_memory() to kill the process. To determine the candidate process to be killed it uses oom_score.
So when Working Set bytes reached to limits, it means that kernel cannot find availables pages even after reclaiming old pages including cache so kernel will trigger OOM-killer to kill the process.
You can find more details on the Linux kernel documents:
https://www.kernel.org/doc/gorman/html/understand/understand016.html
https://www.kernel.org/doc/gorman/html/understand/understand013.html

node.js heap memory and used heap size [pm2]

I am currently running node.js using pm2.
And recently, I was able to check "custom metrics" using the pm2 monit command.
Here, information such as Heap size, used heap size, and active requests are shown.
I don't know how the heap size is determined. Actually, I checked pm2 running on different servers.
Each was set to 95mib / 55mib, and accordingly, the used heap size was different.
Also, is the heap usage closer to 100% the better?
While searching on "StackOverflow" to find related information, I saw the following article.
What does Heap Usage mean in PM2
Also what means active requests ? It is continuously zero.
Thank you!
[Edit]
env : ubuntu18.04 [ ec2 - t3.micro ]
node version : v10.15
[Additional]
server memory : 1GB [ 40~50% used ]
cpu : vCPU (2) [ 1~2% used ]
The heap is the RAM used by the program you're asking PM2 to manage and monitor. Heap space, in Javascript and similar language runtimes, is allocated when your program creates objects and released upon garbage collection. Your runtime asks your OS for more heap space whenever it needs it: when active allocations exceed the free space. So your heap size will probably grow as your program starts up. That's normal.
Most programs allocate and release lots of objects as they do their work, so you should not try to optimize the % usage of your heap. When your program is running at a steady state – that is, after it has started up — you'll find the % utilization creeping up until garbage collection happens, and then dropping back. For example, a nodejs/express web server allocates req and res objects for each incoming request, then uses them, then drops them so the garbage collector can reclaim their RAM.
If your allocated heap size keeps growing, over minutes or hours, you probably have a memory leak. That is a programming bug: a problem you should do your best to solve. You should look up how that works for your application language. Other than that, don't worry too much about heap usage.
Active requests count work being done via various asynchronous objects like file writers and TCP connections. Unless your program is very busy it stays near zero.
Keep an eye on loop delay if your program does computations. If it creeps up, some computation function is hogging Javascript.

cgroup cpuacct.stat doesn't match /proc/pid/stat

Recently did some perf testing of a process running in a docker container. CPU usage was gathered via the cpuacct.stat file found in the /sys/fs container directory. When monitoring the process by hand, using a tool such as top, I noticed that the cgroup CPU usage was always less than that shown by top, sometimes by a factor of four.
I wrote a quick script to track the ticks used by the process in the container as reported by both cpuacct.stat and /proc//stat. They aren't even close. The proc stats are much larger, which doesn't make any sense; it's like saying the container is smaller than what's in it.
The only reference to any of this I could find was a kernel comment saying that cpuacct.stat might be a little inaccurate once in a while. This is more than a rounding error.
Anyone have any experience or knowledge about this? It kind of throws all of my CPU usage metrics into doubt.
Linux 3.10.0-327.18.2.el7.x86_64 Centos on an 8 cpu box.

linux CPU cache slowdown

We're getting overnight lockups on our embedded (Arm) linux product but are having trouble pinning it down. It usually takes 12-16 hours from power on for the problem to manifest itself. I've installed sysstat so I can run sar logging, and I've got a bunch of data, but I'm having trouble interpreting the results.
The targets only have 512Mb RAM (we have other models which have 1Gb, but they see this issue much less often), and have no disk swap files to avoid wearing the eMMCs.
Some kind of paging / virtual memory event is initiating the problem. In the sar logs, pgpin/s, pgnscand/s and pgsteal/s, and majflt/s all increase steadily before snowballing to crazy levels. This puts the CPU up correspondingly high levels (30-60 on dual core Arm chips). At the same time, the frmpg/s values go very negative, whilst campg/s go highly positive. The upshot is that the system is trying to allocate a large amount of cache pages all at once. I don't understand why this would be.
The target then essentially locks up until it's rebooted or someone kills the main GUI process or it crashes and is restarted (We have a monolithic GUI application that runs all the time and generally does all the serious work on the product). The network shuts down, telnet blocks forever, as do /proc filesystem queries and things that rely on it like top. The memory allocation profile of the main application in this test is dominated by reading data in from file and caching it as textures in video memory (shared with main RAM) in an LRU using OpenGL ES 2.0. Most of the time it'll be accessing a single file (they are about 50Mb in size), but I guess it could be triggered by having to suddenly use a new file and trying to cache all 50Mb of it all in one go. I haven't done the test (putting more logging in) to correlate this event with these system effects yet.
The odd thing is that the actual free and cached RAM levels don't show an obvious lack of memory (I have seen oom-killer swoop in the kill the main application with >100Mb free and 40Mb cache RAM). The main application's memory usage seems reasonably well-behaved with a VmRSS value that seems pretty stable. Valgrind hasn't found any progressive leaks that would happen during operation.
The behaviour seems like that of a system frantically swapping out to disk and making everything run dog slow as a result, but I don't know if this is a known effect in a free<->cache RAM exchange system.
My problem is superficially similar to question: linux high kernel cpu usage on memory initialization but that issue seemed driven by disk swap file management. However, dirty page flushing does seem plausible for my issue.
I haven't tried playing with the various vm files under /proc/sys/vm yet. vfs_cache_pressure and possibly swappiness would seem good candidates for some tuning, but I'd like some insight into good values to try here. vfs_cache_pressure seems ill-defined as to what the difference between setting it to 200 as opposed to 10000 would be quantitatively.
The other interesting fact is that it is a progressive problem. It might take 12 hours for the effect to happen the first time. If the main app is killed and restarted, it seems to happen every 3 hours after that fact. A full cache purge might push this back out, though.
Here's a link to the log data with two files, sar1.log, which is the complete output of sar -A, and overview.log, a extract of free / cache mem, CPU load, MainGuiApp memory stats, and the -B and -R sar outputs for the interesting period between midnight and 3:40am:
https://drive.google.com/folderview?id=0B615EGF3fosPZ2kwUDlURk1XNFE&usp=sharing
So, to sum up, what's my best plan here? Tune vm to tend to recycle pages more often to make it less bursty? Are my assumptions about what's happening even valid given the log data? Is there a cleverer way of dealing with this memory usage model?
Thanks for your help.
Update 5th June 2013:
I've tried the brute force approach and put a script on which echoes 3 to drop_caches every hour. This seems to be maintaining the steady state of the system right now, and the sar -B stats stay on the flat portion, with very few major faults and 0.0 pgscand/s. However, I don't understand why keeping the cache RAM very low mitigates a problem where the kernel is trying to add the universe to cache RAM.

Swappiness in JVM

I stumbled upon some interesting problems last week where I noticed that the one of our production servers, which has Apache HTTP Server over Tomcat running, stopped, reporting an HTTP outage.
On further investigating this issue it seemed like it was due to JVM causing memory pages to be swapped out quickly. This resulted in the swap space being completely populated, causing a memory issue the next time a page is moved to swap.
Investigating further, it seems that there is a JVM swappiness factor that's set to 60% by default with some of the our Linux distributions. Based on some research, it seems that this could be a high value for a web service that has high traffic. Our swap space was set to 2 GB.
The swap details are:
Filename Type Size Used Priority
/dev/sda3 partition 2096472 1261420 -1
From /proc/meminfo
SwapCached: 944668 kB
Our JVM properties are as follows:
-Xmx6g -Xms4g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:PermSize=512M -XX:MaxPermSize=1024M -XX:NewSize=2g -XX:MaxNewSize=2g -XX:ParallelGCThreads=8
The server runs with 12 GB RAM.
Swappiness does not go well with the JVM's GC process. Thus I tried reducing the swappiness to 0, but that did not change anything. We still see cases where the entire swap space is consumed and resulting in an OutOfMemory error.
How can I tune the JVM performance?
The swappiness factor is actually a system-wide setting in Linux, not JVM-specific.
My personal experience with some kinds of applications is that they need swap to be turned off in order to work without hiccups, period. While I can't say for sure if your application belongs to this category, I have seen apps being swapped out despite enough free RAM being available. As you pointed out, GC and swap don't mix well. Swappiness is just an indication to the OS, so its impact on how much gets swapped out may be larger or smaller depending on circumstances. My suggestion would be to try turning swap completely off.
In order to do this, you need to be root or have sudo access. Comment out the line describing swap in /etc/fstab, which will prevent swap from being turned on after a reboot. Then, in order to turn swap off for the current server run, run swapoff -a. This may take up to a few minutes if there's data in swap that needs to be pulled back into RAM. Then, check the last line of the output of free to ensure that the total size of available swap is 0. After this, observe your app in order to determine if turning swap off solved your issue.
If your physical RAM is 12 GB and your heap is set to only 6 GB (max), then you have plenty of RAM. Is this server dedicated to the Tomcat instance?
Take a look at the verbose GC log files to see what the memory usage is over time. Check your access log files to confirm whether the access requests have increased over time.

Resources