cgroup cpuacct.stat doesn't match /proc/pid/stat

cgroup cpuacct.stat doesn't match /proc/pid/stat - linux

Recently did some perf testing of a process running in a docker container. CPU usage was gathered via the cpuacct.stat file found in the /sys/fs container directory. When monitoring the process by hand, using a tool such as top, I noticed that the cgroup CPU usage was always less than that shown by top, sometimes by a factor of four.
I wrote a quick script to track the ticks used by the process in the container as reported by both cpuacct.stat and /proc//stat. They aren't even close. The proc stats are much larger, which doesn't make any sense; it's like saying the container is smaller than what's in it.
The only reference to any of this I could find was a kernel comment saying that cpuacct.stat might be a little inaccurate once in a while. This is more than a rounding error.
Anyone have any experience or knowledge about this? It kind of throws all of my CPU usage metrics into doubt.
Linux 3.10.0-327.18.2.el7.x86_64 Centos on an 8 cpu box.

Related

Docker not reporting memory usage correctly?

Through some longevity testing with docker (docker 1.5 and 1.6 with no memory limit) on (centos 7 / rhel 7) and observing the systemd-cgtop stats for the running containers, I noticed what appeared to be very high memory use. Typically the particular application running in a non-containerized state only utilizes around 200-300Meg of memory. Over a 3 day period I ended up seeing systemd-cgtop reporting that my container was up to 13G of memory used. While I am not an expert Linux admin by any means, I started digging in to this which pointed me to the following articles:
https://unix.stackexchange.com/questions/34795/correctly-determining-memory-usage-in-linux
http://corlewsolutions.com/articles/article-6-understanding-the-free-command-in-ubuntu-and-linux
So basically what I am understanding is to determine the actual free memory within the system unit would be to look at the -/+ buffers/cache: within "free -m" and not the top line, as I also noticed that the top line within "free -m" would constantly increase with memory used and constantly show a decreased amount of free memory just like what I am observing with my container through systemd-cgtop. If I observe the -/+ buffers/cache: line I will see the actual stable amounts of memory being used / free. Also, if I observe the actual process within top on the host, I can see the process itself is only ever using less then 1% of memory (0.8% of 32G).
I am a bit confused as to whats going on here. If I set a memory limit of 500-1000M for a container (I believe it would turn out to be twice as much due to the swap) would my process eventually stop when I reach my memory limit, even though the process itself is not using anywhere near that much memory? If anybody out there has any feedback on the former, that would be great. Thanks!

I used docker in CentOS 7 for a while, and got the same confused by these. Checking the github issues link, it looks like docker stats in this release is kind of mislead.
https://github.com/docker/docker/issues/10824
so I just ignored memory usage getting from docker stats.

A year since you asked, but adding an answer here for anyone else interested. If you set a memory limit, I think it would not be killed unless it fails to reclaim unused memory. the cgroups metrics and consequently docker stats shows page cache+RES. You could look at the cgroups detailed metrics to see the breakup
I had a similar issue and when I tested with a memory limit, I saw that the container is not killed. Rather the memory is reclaimed and reused.

linux CPU cache slowdown

We're getting overnight lockups on our embedded (Arm) linux product but are having trouble pinning it down. It usually takes 12-16 hours from power on for the problem to manifest itself. I've installed sysstat so I can run sar logging, and I've got a bunch of data, but I'm having trouble interpreting the results.
The targets only have 512Mb RAM (we have other models which have 1Gb, but they see this issue much less often), and have no disk swap files to avoid wearing the eMMCs.
Some kind of paging / virtual memory event is initiating the problem. In the sar logs, pgpin/s, pgnscand/s and pgsteal/s, and majflt/s all increase steadily before snowballing to crazy levels. This puts the CPU up correspondingly high levels (30-60 on dual core Arm chips). At the same time, the frmpg/s values go very negative, whilst campg/s go highly positive. The upshot is that the system is trying to allocate a large amount of cache pages all at once. I don't understand why this would be.
The target then essentially locks up until it's rebooted or someone kills the main GUI process or it crashes and is restarted (We have a monolithic GUI application that runs all the time and generally does all the serious work on the product). The network shuts down, telnet blocks forever, as do /proc filesystem queries and things that rely on it like top. The memory allocation profile of the main application in this test is dominated by reading data in from file and caching it as textures in video memory (shared with main RAM) in an LRU using OpenGL ES 2.0. Most of the time it'll be accessing a single file (they are about 50Mb in size), but I guess it could be triggered by having to suddenly use a new file and trying to cache all 50Mb of it all in one go. I haven't done the test (putting more logging in) to correlate this event with these system effects yet.
The odd thing is that the actual free and cached RAM levels don't show an obvious lack of memory (I have seen oom-killer swoop in the kill the main application with >100Mb free and 40Mb cache RAM). The main application's memory usage seems reasonably well-behaved with a VmRSS value that seems pretty stable. Valgrind hasn't found any progressive leaks that would happen during operation.
The behaviour seems like that of a system frantically swapping out to disk and making everything run dog slow as a result, but I don't know if this is a known effect in a free<->cache RAM exchange system.
My problem is superficially similar to question: linux high kernel cpu usage on memory initialization but that issue seemed driven by disk swap file management. However, dirty page flushing does seem plausible for my issue.
I haven't tried playing with the various vm files under /proc/sys/vm yet. vfs_cache_pressure and possibly swappiness would seem good candidates for some tuning, but I'd like some insight into good values to try here. vfs_cache_pressure seems ill-defined as to what the difference between setting it to 200 as opposed to 10000 would be quantitatively.
The other interesting fact is that it is a progressive problem. It might take 12 hours for the effect to happen the first time. If the main app is killed and restarted, it seems to happen every 3 hours after that fact. A full cache purge might push this back out, though.
Here's a link to the log data with two files, sar1.log, which is the complete output of sar -A, and overview.log, a extract of free / cache mem, CPU load, MainGuiApp memory stats, and the -B and -R sar outputs for the interesting period between midnight and 3:40am:
https://drive.google.com/folderview?id=0B615EGF3fosPZ2kwUDlURk1XNFE&usp=sharing
So, to sum up, what's my best plan here? Tune vm to tend to recycle pages more often to make it less bursty? Are my assumptions about what's happening even valid given the log data? Is there a cleverer way of dealing with this memory usage model?
Thanks for your help.
Update 5th June 2013:
I've tried the brute force approach and put a script on which echoes 3 to drop_caches every hour. This seems to be maintaining the steady state of the system right now, and the sar -B stats stay on the flat portion, with very few major faults and 0.0 pgscand/s. However, I don't understand why keeping the cache RAM very low mitigates a problem where the kernel is trying to add the universe to cache RAM.

Linux memory usage history

I had a problem in which my server began failing some of its normal processes and checks because the server's memory was completely full and taken.
I looked in the logging history and found that what it killed were some Java processes.
I used the "top" command to see what processes were taking up the most memory right now(after the issue was fixed) and it was a Java process. So in essence, I can tell what processes are taking up the most memory right now.
What I want to know is if there is a way to see what processes were taking up the most memory at the time when the failures started happening? Perhaps Linux keeps track or a log of the memory usage at particular times? I really have no idea but it would be great if I could see that kind of detail.

#Andy has answered your question. However, I'd like to add that for future reference use a monitoring tool. Something like these. These will give you what happened during a crash since you obviously cannot monitor all your servers all the time. Hope it helps.

Are you saying the kernel OOM killer went off? What does the log in dmesg say? Note that you can constrain a JVM to use a fixed heap size, which means it will fail affirmatively when full instead of letting the kernel kill something else. But the general answer to your question is no: there's no way to reliably run anything at the time of an OOM failure, because the system is out of memory! At best, you can use a separate process to poll the process table and log process sizes to catch memory leak conditions, etc...

There is no history of memory usage in linux be default, but you can achieve it with some simple command-line tool like sar.
Regarding your problem with memory:
If it was OOM-killer that did some mess on machine, then you have one great option to ensure it won't happen again (of course after reducing JVM heap size).
By default linux kernel allocates more memory than it has really. This, in some cases, can lead to OOM-killer killing the most memory-consumptive process if there is no memory for kernel tasks.
This behavior is controlled by vm.overcommit sysctl parameter.
So, you can try setting it to vm.overcommit = 2 is sysctl.conf and then run sysctl -p.
This will forbid overcommiting and make possibility of OOM-killer doing nasty things very low. Also you can think about adding a little-bit of swap space (if you don't have it already) and setting vm.swappiness to some really low value (like 5, for example. default value is 60), so in normal workflow your application won't go into swap, but if you'll be really short on memory, it will start using it temporarily and you will be able to see it even with df
WARNING this can lead to processes receiving "Cannot allocate memory" error if you have your server overloaded by memory. In this case:
Try to restrict memory usage by applications
Move part of them to another machine

How do I measure CPU, memory and disk usage during a build?

I'm trying to improve my build times and want to have some feedback in place to measure where my problems are.
I'm using GNU Make on a Linux CentOS system to build the Linux kernel along with some application code. I can run Make with 'time' to get an overall time for the complete build, but that doesn't tell me where the bottlenecks are.
I used -j with Make to run it on multiple cores on my build machine, but I ran top during the build and noticed the CPU cores were often idle.
Any suggestions for the best way to measure disk and memory usage during the build?
Anything else I should be measuring?
No preference on text-based or GUI - whatever gives me some data I can use.

For real time measurement I use tex-based htop from third-party repositories. It is like top but better, it shows graphically cpu (all cpu's separately), ram load.

the output of my fortran code is killed , any suggestion?

I'm trying to run a code on ssh that works perfect for a smaller mesh , but since the new mesh is much bigger i used ifort command to compile it,
ifort -mcmodel=medium -i-dynamic -otest.out*.f
and it complies but when i run it , the output is:
killed
i know that problem is from memory, does anyone know if there's any way to run it?
how can i understand where in code cause memory problem?
Thanks
shadi

From the ifort command line, I think you are running on Linux.
Seeing "killed" as output is generally the result of Linux's Out Of Memory killer (OOM) getting involved to prevent an impending crash (because it's common practice for applications to ask for more memory then they need requests for more memory than is currently available are accepted - check for "Out of Memory: Killed process [PID] [process name]" in the system log files). The OOM killer is generally pretty good at disposing of the application responsible for using all the memory, so the place to start is your applications memory usage.
The first thing to do is try and estimate (even if it's only roughly) how much memory you expect your application to use. One approach is to guestimate the size of the major arrays and multiply them by the number of bits needed per element. Another approach is to think about how you would expect the memory use to grow with mesh size. You can study this by experiment (run with different mesh sizes, measure the memory use and extrapolate) or from one measurement and knowledge of how the major array scale. It may be that you are asking for much more memory then you have on the machine: and the solution to this is probably to get a access to bigger computer. (Or you could try and find an alternative algorithm which uses less memory.)
If their is a memory leak you should see more memory use than expected, even for the smaller mesh size. If this is the case, valgrind should help. Moving from static to dynamic storage probably isn't going to help here - I would expect to see a segmentation fault if you were just exceeding the available space on the stack.

try using valgrind. i tried it to find memory leaks in my fortran code with good success.
http://valgrind.org/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string