We have two machines with identical configuration and use (we have two balanced Siebel application servers in them).
Normally, we have a very similar RAM usage in them (around 7 Gb).
Recently, we've have a sudden increase of RAM in only one of them and now we have close to 14 Gb utilization of RAM in that machine.
So, for very similar boxes, we have one of them using 7Gb of RAM while the other one is consuming 14 Gb.
Now, using ps aux command to determine which process it's using all this additional memory, we see memory consumption is very similar in both machines. Somehow, we don't see any process that's using those 7 Gb of additional RAM.
Let's see:
Machine 1:
total used free shared buffers cached
Mem: 15943 15739 204 0 221 1267
-/+ buffers/cache: 14249 1693
Swap: 8191 0 8191
So, we have 14249 Mb usage of RAM.
Machine 2:
total used free shared buffers cached
Mem: 15943 15636 306 0 962 6409
-/+ buffers/cache: 8264 7678
Swap: 8191 0 8191
So, we have 8264 Mb usage of RAM.
I guess, the sum of Resident Set Size memory of ps should be equal or bigger to this value. According to this answer is how much memory is allocated to the process and is in RAM (including memory from shared libraries). We don't have any memory in SWAP.
However:
Machine 1:
ps aux | awk 'BEGIN {sum=0} {sum +=$6} END {print sum/1024}'
8357.08
8357.08 < 14249 -> NOK!
Machine 2:
ps aux | awk 'BEGIN {sum=0} {sum +=$6} END {print sum/1024}'
8468.63
8468.63 > 8264 -> OK
What do I get wrong? How can I find where this "missing" memory is?
Thank you in advance
If them two are virtual machines, maybe the "missing" memory is occupied by Balloon driver, especially they are hosted by VMware ESXi.
Recently I encounter the similar scenario. Sum of all process RSS is 14GB, command free shows 26GB used, so there are 12GB memory missing.
After search on internet, I follow this article and execute command vmware-toolbox-cmd stat balloon on my VM, console shows 12xxxMB (used by balloon), BINGO!
Here is my current setting:
vm.overcommit_ratio = 50 (default)
vm.overcommit_memory = 2
And Current Memory Usage:
[localhost~]$ free -g
total used free shared buffers cached
Mem: 47 46 0 0 0 45
-/+ buffers/cache: 1 45
Swap: 47 0 47
As per the documentation what I understood is:
vm.overcommit_memory = 2 will not allow to overcommit memory than 50 % of RAM (as vm.overcommit_ratio is 50) but still I can see that current memory usage is 46 GB out of 47 GB.
Did I misunderstood anything?
I believe the default for vm.overcommit_memory is 0 and not 2. Is the overcommit_ratio only relevant to mode 2? I assume yes, but I'm not entirely sure.
From https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
0 - Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a seriously
wild allocation fails while allowing overcommit to reduce swap
usage. root is allowed to allocate slightly more memory in this
mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications.
Classic example is code using sparse arrays and just relying on the
virtual memory consisting almost entirely of zero pages.
2 - Don't overcommit. The total address space commit for the system
is not permitted to exceed swap + a configurable amount (default is
50%) of physical RAM. Depending on the amount you use, in most
situations this means a process will not be killed while accessing
pages but will receive errors on memory allocation as appropriate.
Instead of free -g which I assume rounds down to zero, you might want to use free -m or just free to be more precise.
This might be interesting as well:
cat /proc/meminfo|grep Commit
we run a debian server with 64Gb of RAM to run large python simulations.
The problem we face is that a large amount of this memory is getting used and we don't know why or how to correct that.
It appears it is not a cache/buffer thing:
free -m
total used free shared buffers cached
Mem: 64454 56243 8211 20 6 113
-/+ buffers/cache: 56122 8332
Swap: 21051 5834 15217
When running smem, it shows us that after a few days, up to 37 Gb are allocated for the kernel dynamic memory.
Area Used Cache Noncache
firmware/hardware 0 0 0
kernel image 0 0 0
kernel dynamic memory 36.8G 431.0M 36.4G
userspace memory 4.5G 149.7M 4.4G
free memory 21.6G 21.6G 0
----------------------------------------------------------
62.9G 22.2G 40.8G
We rebooted the server yesterday, and while a the start it shows a kernel dynamic memory of 1.5 Gb, it slowly increases.
24 hours later, it has already reached 17Gb
Area Used Cache Noncache
firmware/hardware 0 0 0
kernel image 0 0 0
kernel dynamic memory 17.1G 269.3M 16.8G
userspace memory 36.4G 73.0M 36.3G
free memory 9.4G 9.4G 0
----------------------------------------------------------
62.9G 9.8G 53.2G
Any idea how to investigate further and if this is really a memory leak, what should we do? (kernel is 3.16)
Thanks in advance
My Ubuntu 12 server is mysteriously losing/wasting memory. It has 64GB of ram. About 46GB are shown as used even when I shutdown all my applications. This memory is not reported as used for buffers or caching.
The result of top (while my apps are running; the apps use about 9G):
top - 21:22:48 up 46 days, 10:12, 1 user, load average: 0.01, 0.09, 0.12
Tasks: 635 total, 1 running, 633 sleeping, 1 stopped, 0 zombie
Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65960100k total, 55038076k used, 10922024k free, 271700k buffers
Swap: 0k total, 0k used, 0k free, 4860768k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5303 1002 20 0 26.2g 1.2g 12m S 0 1.8 2:08.21 java
5263 1003 20 0 9.8g 995m 4544 S 0 1.5 0:19.82 mysqld
7021 www-data 20 0 3780m 18m 2460 S 0 0.0 8:37.50 apache2
7022 www-data 20 0 3780m 18m 2540 S 0 0.0 8:38.28 apache2
.... (smaller processes)
Note that top reports 4.8G for cached, not 48G, and it's 55G that are used. The result of free -m:
total used free shared buffers cached
Mem: 64414 53747 10666 0 265 4746
-/+ buffers/cache: 48735 15678
Swap: 0 0 0
What is using my memory? I've tried every diagnostic that I could come across. Forums are swamped with people asking the same question because Linux is using their ram for buffers/cache. This doesn't seem to be what is going on here.
It might be relevant that the system is a host for lxc containers. The top and free results reported above are from the host, but similar memory usage is reported within the containers. Stopping all containers does not free up the memory. Some 46G remain in use. However, if I restart the host the memory is free. It doesn't reach the 46G before a while. (I don't know if it takes days or weeks. It takes more than a few hours.)
It might also be relevant that the system is using zfs. Zfs is reputed memory-hungry, but not that much. This system has two zfs filesystems on two raidz pools, one of 1.5T and one of 200G. I have another server that exhibits exactly the same problem (46G used by nothing) and is configured pretty much identically, but with a 3T array instead of 1.5T. I have lots of snapshots (100 or so) for each zfs filesystem. I normally have one snapshot of each filesystem mounted at any time. Unmounting those does not give me back my memory.
I can see that the VIRT numbers in the screenshot above coincide roughly with the memory used, but the memory remains used even after I shutdown these apps--even after I shutdown the container that's running them.
EDIT: I tried adding some swap, and something interesting happened. I added 30G of swap. Moments later, the amount of memory marked as cached in top had increased from 5G to 25G. Free -m indicated about 20G more usable memory. I added another 10G of swap, and cached memory raised to 33G. If I add another 10G of swap, I get 6G more recognized as cached. All this time, only a few kilobytes of swap are reported used. It's as if the kernel needed to have matching swap for every bit that it recognizes or reports as cached. Here is the output of top with 40G of swap:
top - 23:06:45 up 46 days, 11:56, 2 users, load average: 0.01, 0.12, 0.13
Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65960100k total, 64356228k used, 1603872k free, 197800k buffers
Swap: 39062488k total, 3128k used, 39059360k free, 33101572k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6440 1002 20 0 26.3g 1.5g 11m S 0 2.4 2:02.87 java
6538 1003 20 0 9.8g 994m 4564 S 0 1.5 0:17.70 mysqld
4707 dbourget 20 0 27472 8728 1692 S 0 0.0 0:00.38 bash
Any suggestions highly appreciated.
EDIT 2: Here are the arc* values from /proc/spl/kstat/zfs/arcstats
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 1531800648
arc_meta_limit 4 8654946304
arc_meta_max 4 8661962768
There is no L2ARC activated for ZFS
This memory is very likely used by the ZFS ARC cache and other ZFS related data stored in the kernel memory. The ARC cache is somewhat similar to the buffer cache so there is generally nothing to worry about it as this memory is released by ZFS should there is demand to it.
However, there is a subtle difference between buffer cache memory and ARC cache one. The first one is immediately available to allocation while the ARC cache one is not. ZFS monitors the free RAM available and when too low, it releases RAM to other consumers.
This works fine with most applications but a minority of them are either confused when a low amount of available RAM is reported, or allocate too much/too fast memory for the release process to keep up the pace properly.
That's the reason why ZFS allows to reduce the maximum size the ARC size is allowed to use.
This setting is done in the /etc/modprobe.d/zfs.conf file.
For example, should you want the ARC never to exceed 32 GB, add this line:
options zfs zfs_arc_max=34359738368
To get the current ARC size and various other ARC statistics, run this command:
cat /proc/spl/kstat/zfs/arcstats
The size metric will show the current size of the ARC. Beware that other ZFS related memory areas might also take a share of RAM and won't be necessarily quickly released even when no more used. Finally, ZFS on linux is certainly less mature than the Solaris native implementation so you might be hit by a bug like this one.
Note too that due to the share storage pool design, unmounting a ZFS file system won't free any resource. You would need to export a pool for memory to be eventually released.
I am using "free -m -t " command to monitor my linux system and get
total used free shared buffers cached
Mem: 64334 64120 213 0 701 33216
-/+ buffers/cache: 30202 34131
Swap: 996 0 996
Total: 65330 64120 1209
it means 30GB of physical memory is used by user processes.
but when using top command and sort by memory usage, only 3~4GB of memory is used by all the application processes.
Why does this inconsistency happen?
As I understand it, the amount of memory that top shows as used includes cold memory from older processes that are not running anymore. This is due to the fact that in case of a restart of said process, the required data may still be in memory, enabling the system to start the process faster and more efficiently instead or always reloading the data from disk.
or, in short, linux generally frees cold data in memory as late as possible.
Hope that clears it up :)