I am trying to configure monitoring on a variety of Kubernetes (GKE) nodes, specifically to identify [near] out-of-memory conditions. The documentation for node/memory/allocatable_utilization states:
This value cannot exceed 1 as usage cannot exceed allocatable memory bytes.
However, it reports a non-evictable value > 1 (1.015), which contradicts that constraint. Also, it's not clear to me how this corresponds with the actual condition on the node, as shown by free -m:
$ free -m
total used free shared buff/cache available
Mem: 15038 10041 184 67 4812 4606
Swap: 0 0 0
This node is designed to run memory-intensive workloads (Java) and as such this is in line with what I'd expect per our heap size planning.
Why would Stackdriver report this value with those conditions on the node?
Related
I have a one a one node Kubernetes cluster and the memory usage reported by the metrics server does not seem to be the same as the memory usage shown with the free command
# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
<node_ip> 1631m 10% 13477Mi 43%
# free -m
total used free shared buff/cache available
Mem: 32010 10794 488 81 20727 19133
Swap: 16127 1735 14392
And the difference is significant ~ 3 GB.
I have also tested this on a 3 node cluster, and the issue is present there too:
# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
<node_ip1> 1254m 8% 26211Mi 84%
<node_ip2> 221m 1% 5021Mi 16%
<node_ip3> 363m 2% 8731Mi 28%
<node_ip4> 1860m 11% 20399Mi 66%
# free -m (this is on node 1)
total used free shared buff/cache available
Mem: 32010 5787 369 1676 25853 24128
Swap: 16127 0 16127
Why is there a difference?
The answer for your question can be found here. It is a duplicate so you can remove this post from StackOverflow.
The metrics exposed by the Metrics Server are collected by an instance of cAdvisor on each node. What you see in the output of kubectl top node is how cAdvisor determines the current resource usage.
So, apparently cAdvisor and free determine the resource usage in different ways. To find out why, you would need to dig into internals of how cAdvisor and free work.
We have two machines with identical configuration and use (we have two balanced Siebel application servers in them).
Normally, we have a very similar RAM usage in them (around 7 Gb).
Recently, we've have a sudden increase of RAM in only one of them and now we have close to 14 Gb utilization of RAM in that machine.
So, for very similar boxes, we have one of them using 7Gb of RAM while the other one is consuming 14 Gb.
Now, using ps aux command to determine which process it's using all this additional memory, we see memory consumption is very similar in both machines. Somehow, we don't see any process that's using those 7 Gb of additional RAM.
Let's see:
Machine 1:
total used free shared buffers cached
Mem: 15943 15739 204 0 221 1267
-/+ buffers/cache: 14249 1693
Swap: 8191 0 8191
So, we have 14249 Mb usage of RAM.
Machine 2:
total used free shared buffers cached
Mem: 15943 15636 306 0 962 6409
-/+ buffers/cache: 8264 7678
Swap: 8191 0 8191
So, we have 8264 Mb usage of RAM.
I guess, the sum of Resident Set Size memory of ps should be equal or bigger to this value. According to this answer is how much memory is allocated to the process and is in RAM (including memory from shared libraries). We don't have any memory in SWAP.
However:
Machine 1:
ps aux | awk 'BEGIN {sum=0} {sum +=$6} END {print sum/1024}'
8357.08
8357.08 < 14249 -> NOK!
Machine 2:
ps aux | awk 'BEGIN {sum=0} {sum +=$6} END {print sum/1024}'
8468.63
8468.63 > 8264 -> OK
What do I get wrong? How can I find where this "missing" memory is?
Thank you in advance
If them two are virtual machines, maybe the "missing" memory is occupied by Balloon driver, especially they are hosted by VMware ESXi.
Recently I encounter the similar scenario. Sum of all process RSS is 14GB, command free shows 26GB used, so there are 12GB memory missing.
After search on internet, I follow this article and execute command vmware-toolbox-cmd stat balloon on my VM, console shows 12xxxMB (used by balloon), BINGO!
Here is my current setting:
vm.overcommit_ratio = 50 (default)
vm.overcommit_memory = 2
And Current Memory Usage:
[localhost~]$ free -g
total used free shared buffers cached
Mem: 47 46 0 0 0 45
-/+ buffers/cache: 1 45
Swap: 47 0 47
As per the documentation what I understood is:
vm.overcommit_memory = 2 will not allow to overcommit memory than 50 % of RAM (as vm.overcommit_ratio is 50) but still I can see that current memory usage is 46 GB out of 47 GB.
Did I misunderstood anything?
I believe the default for vm.overcommit_memory is 0 and not 2. Is the overcommit_ratio only relevant to mode 2? I assume yes, but I'm not entirely sure.
From https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
0 - Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a seriously
wild allocation fails while allowing overcommit to reduce swap
usage. root is allowed to allocate slightly more memory in this
mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications.
Classic example is code using sparse arrays and just relying on the
virtual memory consisting almost entirely of zero pages.
2 - Don't overcommit. The total address space commit for the system
is not permitted to exceed swap + a configurable amount (default is
50%) of physical RAM. Depending on the amount you use, in most
situations this means a process will not be killed while accessing
pages but will receive errors on memory allocation as appropriate.
Instead of free -g which I assume rounds down to zero, you might want to use free -m or just free to be more precise.
This might be interesting as well:
cat /proc/meminfo|grep Commit
I was running an application which was to load about 60 mil items in memcache. I had two servers added in a bucket. After about 65% of the data was loaded, I saw 1.3 mil items evicted in both servers. And these were statistics at that point.
On server 1
STAT bytes_written 619117542
STAT limit_maxbytes 3145728000
On server 2
STAT bytes_written 619118863
STAT limit_maxbytes 3145728000
Here's the output of free -m at that point of time.
On server 1
total used free shared buffers cached
Mem: 7987 5965 2021 0 310 441
-/+ buffers/cache: 5213 2774
Swap: 4095 0 4095
On sever 2
total used free shared buffers cached
Mem: 11980 11873 106 0 207 5860
-/+ buffers/cache: 5805 6174
Swap: 5119 0 5119
As we can see, on both servers, limit_maxbytes was not reached. Only about 600MB was used at both the places. However on server 2, free memory dipped to as low as 100 mb. Now I know that cached is 5.8 GB and that linux could free that memory for running processes. But it looks like that didn't happen and seeing memory reaching critical level, memcached started evicting items.
Or is there any other reason? When exactly does linux free up cache memory? Is 100 mb of free ram is still not critical enough for linux to free up cache? Please help me understanding why such an even occured.
The 'slabs' refer to how Memcached allocates memory. Rather than a complex exact-match,it puts your data into a close-enough (slightly larger) piece of memory within the server. This means that it will frequently 'waste' memory that isn't storing your data.
You can tweak how big each potential slot is though when you start the memcached server with the factor (-f) and the initial chunk-size (-s) options. How you set those, depends on the mix of sizes you are storing in cache.
I am using "free -m -t " command to monitor my linux system and get
total used free shared buffers cached
Mem: 64334 64120 213 0 701 33216
-/+ buffers/cache: 30202 34131
Swap: 996 0 996
Total: 65330 64120 1209
it means 30GB of physical memory is used by user processes.
but when using top command and sort by memory usage, only 3~4GB of memory is used by all the application processes.
Why does this inconsistency happen?
As I understand it, the amount of memory that top shows as used includes cold memory from older processes that are not running anymore. This is due to the fact that in case of a restart of said process, the required data may still be in memory, enabling the system to start the process faster and more efficiently instead or always reloading the data from disk.
or, in short, linux generally frees cold data in memory as late as possible.
Hope that clears it up :)