Memory usage: Program allocates too much memory - linux

I have coded a program in C++ for Ubuntu Server (64-Bit) which should run 24/7. The Server has 2GB RAM, but apparently my program is allocating too much memory.
This is the output of top after about 2 hours
top - 13:35:57 up 1:39, 1 user, load average: 0.15, 0.13, 0.08
Tasks: 68 total, 2 running, 66 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.9 us, 5.7 sy, 0.0 ni, 92.3 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 2050048 total, 540852 used, 1509196 free, 34872 buffers
KiB Swap: 1509372 total, 0 used, 1509372 free. 93060 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
902 root 20 0 1019896 364920 4492 S 13.1 17.8 13:07.03 Bether
How you can see my code already consumes 17.8% memory. At some point, the server will crash because it has no memory left.
My problem is that the program should not do that, but I can't find out where the memory gets allocated and not free'd anymore. Is there a tool, maybe even inside gdb, to find out where the program allocates the most memory?
Thanks in advance!

Check out Valgrind, it should be in the Ubuntu repository. it can give you detailed information about memory usage in C++ programs. Kind of like a debugger for memory usage.
valgrind --tool=memcheck <your_app> <your_apps_params>
Also check out ccmalloc, NJAMD, LeakTracer

Related

How to correctly identify and correct a memory leak on a server?

we run a debian server with 64Gb of RAM to run large python simulations.
The problem we face is that a large amount of this memory is getting used and we don't know why or how to correct that.
It appears it is not a cache/buffer thing:
free -m
total used free shared buffers cached
Mem: 64454 56243 8211 20 6 113
-/+ buffers/cache: 56122 8332
Swap: 21051 5834 15217
When running smem, it shows us that after a few days, up to 37 Gb are allocated for the kernel dynamic memory.
Area Used Cache Noncache
firmware/hardware 0 0 0
kernel image 0 0 0
kernel dynamic memory 36.8G 431.0M 36.4G
userspace memory 4.5G 149.7M 4.4G
free memory 21.6G 21.6G 0
----------------------------------------------------------
62.9G 22.2G 40.8G
We rebooted the server yesterday, and while a the start it shows a kernel dynamic memory of 1.5 Gb, it slowly increases.
24 hours later, it has already reached 17Gb
Area Used Cache Noncache
firmware/hardware 0 0 0
kernel image 0 0 0
kernel dynamic memory 17.1G 269.3M 16.8G
userspace memory 36.4G 73.0M 36.3G
free memory 9.4G 9.4G 0
----------------------------------------------------------
62.9G 9.8G 53.2G
Any idea how to investigate further and if this is really a memory leak, what should we do? (kernel is 3.16)
Thanks in advance

Track down high CPU load average

Trying to understand what's going on with my server.
It's a 2 cpu server, so:
$> grep 'model name' /proc/cpuinfo | wc -l
2
While on load avergae, queue is showing ~8 :
$> uptime
16:31:30 up 123 days, 9:04, 1 user, load average: 8.37, 8.48, 8.55
So You can assume, load is really high and things are pailing up, there is some load on the system and it's not just a spike.
However, Looking at top cpu consumers:
> ps -eo pcpu,pid,user,args | sort -k 1 -r | head -6
%CPU PID USER COMMAND
8.3 27187 **** server_process_c
1.0 22248 **** server_process_b
0.5 22282 **** server_process_a
0.0 31167 root head -6
0.0 31166 root sort -k 1 -r
0.0 31165 root ps -eo pcpu,pid,user,args
Results of free command:
total used free shared buffers cached
Mem: 7986 7934 52 0 9 2446
-/+ buffers/cache: 5478 2508
Swap: 17407 60 17347
This is the result on an ongoing basis, e.g. not even
a single CPU is being used, top consumer, is always ~8.5%.
My Question: What are my ways to track down the root of the high load?
Based on your free output, there are times when system memory is exhausted so swap buffer is used (see column used = 60). Total memory used used - (buffers + cached) which result almost zero. It means there are time when all physical RAM is consumed.
For server, try to avoid page fault which may cause swapping data from system memory to swap buffer (or vice versa) as much as possible because accessing hard drive is very slow than system RAM.
In your top output, try to investigate wa column. Higher percentage value means CPU spend more times waiting for data IO from disk rather than doing meaningful computation.
Cpu(s): 87.3%us, 1.2%sy, 0.0%ni, 27.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Try to reduce daemon or service that you do not need to reduce memory footprint and consider to add more RAM to the system.
For 2 CPU(s) server, ideal load is less than 2.0 (each CPU load is less than 1.0). Load of 8.0 means each CPU load is roughly 4.0 which is not very good.
Have you tried the htop command? It shows more information in a helpful way sometimes.

memory usage more than 100%

I am using arm processor and one qt based gui application.
There is an issue of slow process.
Mem: 36272K used, 24692K free, 0K shrd, 188K buff, 19544K cached
CPU: 6.1% usr 1.3% sys 0.0% nic 92.4% idle 0.0% io 0.0% irq 0.0% sirq
Load average: 0.25 0.18 0.07 1/43 553
PID : 512
PPID : 1
USER : root
STAT : S
VSZ : 62368
%MEM : 102.0
CPU : 0
%CPU : 5.5
COMMAND : ./gopaljeearm -qws -nomouse
This is status when i use top command.
There is a very nice answer for Android applications which in turn should be applicable for most of the Linux applications. Quoting...
Note that memory usage on modern operating systems like Linux is an
extremely complicated and difficult to understand area. In fact the
chances of you actually correctly interpreting whatever numbers you
get is extremely low.
you can read rest of it here.
Another nice read is ELC: How much memory are applications really using? from LWN.

Lost memory on Linux - not cached, not buffers

My Ubuntu 12 server is mysteriously losing/wasting memory. It has 64GB of ram. About 46GB are shown as used even when I shutdown all my applications. This memory is not reported as used for buffers or caching.
The result of top (while my apps are running; the apps use about 9G):
top - 21:22:48 up 46 days, 10:12, 1 user, load average: 0.01, 0.09, 0.12
Tasks: 635 total, 1 running, 633 sleeping, 1 stopped, 0 zombie
Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65960100k total, 55038076k used, 10922024k free, 271700k buffers
Swap: 0k total, 0k used, 0k free, 4860768k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5303 1002 20 0 26.2g 1.2g 12m S 0 1.8 2:08.21 java
5263 1003 20 0 9.8g 995m 4544 S 0 1.5 0:19.82 mysqld
7021 www-data 20 0 3780m 18m 2460 S 0 0.0 8:37.50 apache2
7022 www-data 20 0 3780m 18m 2540 S 0 0.0 8:38.28 apache2
.... (smaller processes)
Note that top reports 4.8G for cached, not 48G, and it's 55G that are used. The result of free -m:
total used free shared buffers cached
Mem: 64414 53747 10666 0 265 4746
-/+ buffers/cache: 48735 15678
Swap: 0 0 0
What is using my memory? I've tried every diagnostic that I could come across. Forums are swamped with people asking the same question because Linux is using their ram for buffers/cache. This doesn't seem to be what is going on here.
It might be relevant that the system is a host for lxc containers. The top and free results reported above are from the host, but similar memory usage is reported within the containers. Stopping all containers does not free up the memory. Some 46G remain in use. However, if I restart the host the memory is free. It doesn't reach the 46G before a while. (I don't know if it takes days or weeks. It takes more than a few hours.)
It might also be relevant that the system is using zfs. Zfs is reputed memory-hungry, but not that much. This system has two zfs filesystems on two raidz pools, one of 1.5T and one of 200G. I have another server that exhibits exactly the same problem (46G used by nothing) and is configured pretty much identically, but with a 3T array instead of 1.5T. I have lots of snapshots (100 or so) for each zfs filesystem. I normally have one snapshot of each filesystem mounted at any time. Unmounting those does not give me back my memory.
I can see that the VIRT numbers in the screenshot above coincide roughly with the memory used, but the memory remains used even after I shutdown these apps--even after I shutdown the container that's running them.
EDIT: I tried adding some swap, and something interesting happened. I added 30G of swap. Moments later, the amount of memory marked as cached in top had increased from 5G to 25G. Free -m indicated about 20G more usable memory. I added another 10G of swap, and cached memory raised to 33G. If I add another 10G of swap, I get 6G more recognized as cached. All this time, only a few kilobytes of swap are reported used. It's as if the kernel needed to have matching swap for every bit that it recognizes or reports as cached. Here is the output of top with 40G of swap:
top - 23:06:45 up 46 days, 11:56, 2 users, load average: 0.01, 0.12, 0.13
Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65960100k total, 64356228k used, 1603872k free, 197800k buffers
Swap: 39062488k total, 3128k used, 39059360k free, 33101572k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6440 1002 20 0 26.3g 1.5g 11m S 0 2.4 2:02.87 java
6538 1003 20 0 9.8g 994m 4564 S 0 1.5 0:17.70 mysqld
4707 dbourget 20 0 27472 8728 1692 S 0 0.0 0:00.38 bash
Any suggestions highly appreciated.
EDIT 2: Here are the arc* values from /proc/spl/kstat/zfs/arcstats
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 1531800648
arc_meta_limit 4 8654946304
arc_meta_max 4 8661962768
There is no L2ARC activated for ZFS
This memory is very likely used by the ZFS ARC cache and other ZFS related data stored in the kernel memory. The ARC cache is somewhat similar to the buffer cache so there is generally nothing to worry about it as this memory is released by ZFS should there is demand to it.
However, there is a subtle difference between buffer cache memory and ARC cache one. The first one is immediately available to allocation while the ARC cache one is not. ZFS monitors the free RAM available and when too low, it releases RAM to other consumers.
This works fine with most applications but a minority of them are either confused when a low amount of available RAM is reported, or allocate too much/too fast memory for the release process to keep up the pace properly.
That's the reason why ZFS allows to reduce the maximum size the ARC size is allowed to use.
This setting is done in the /etc/modprobe.d/zfs.conf file.
For example, should you want the ARC never to exceed 32 GB, add this line:
options zfs zfs_arc_max=34359738368
To get the current ARC size and various other ARC statistics, run this command:
cat /proc/spl/kstat/zfs/arcstats
The size metric will show the current size of the ARC. Beware that other ZFS related memory areas might also take a share of RAM and won't be necessarily quickly released even when no more used. Finally, ZFS on linux is certainly less mature than the Solaris native implementation so you might be hit by a bug like this one.
Note too that due to the share storage pool design, unmounting a ZFS file system won't free any resource. You would need to export a pool for memory to be eventually released.

Linux memory reporting discrepancy [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I'm getting a memory usage discrepancy between meminfo and ps. Free is reporting much less free memory than what processes are apparently using according to ps.
According to free, I have only 3188mb free:
free -m
total used free shared buffers cached
Mem: 15360 13273 2086 0 79 1022
-/+ buffers/cache: 12171 3188
Swap: 0 0 0
I try to track down where the memory is going using ps (snipped below non 0 RSS values):
ps -A --sort -rss -o comm,pmem,rss
COMMAND %MEM RSS
mysqld 13.1 2062272
java 6.2 978072
ruby 0.7 114248
ruby 0.7 114144
squid 0.1 30716
ruby 0.0 11868
apache2 0.0 10132
apache2 0.0 9092
apache2 0.0 8504
PassengerHelper 0.0 5784
sshd 0.0 3008
apache2 0.0 2420
apache2 0.0 2228
bash 0.0 2120
sshd 0.0 1708
rsyslogd 0.0 1164
PassengerLoggin 0.0 880
ps 0.0 844
dbus-daemon 0.0 736
sshd 0.0 736
ntpd 0.0 664
squid 0.0 584
cron 0.0 532
ntpd 0.0 512
exim4 0.0 504
nrpe 0.0 496
PassengerWatchd 0.0 416
dhclient3 0.0 344
mysqld_safe 0.0 316
unlinkd 0.0 284
logger 0.0 252
init 0.0 200
getty 0.0 120
However, this doesn't make sense as adding up the RSS column leads to a total memory usage of only around 3287mb that should leave almost 12gb free!
I'm using kernel 2.6.16.33-xenU #2 SMP x86_64 on Amazon AWS.
Where is my memory going? Can anyone shed some light on how to track this down?
Check the usage of the Slab cache (Slab:, SReclaimable: and SUnreclaim: in /proc/meminfo). This is a cache of in-kernel data structures, and is separate from the page cache reported by free.
If the slab cache is resposible for a large portion of your "missing memory", check /proc/slabinfo to see where it's gone. If it's dentries or inodes, you can use sync ; echo 2 > /proc/sys/vm/drop_caches to get rid of them.
You can also use the slabtop tool to show the current usage of the Slab cache in a friendly format. c will sort the list by current cache size.
You cannot just add up the RSS or VSZ columns to get the amount of memory used. Unfortunately, memory usage on Linux is much more complicated than that. For a more thorough description see Understanding memory usage on Linux, which explains how shared libraries are shared between processes, but double-counted by tools like ps.
I don't know offhand how free computes the numbers it displays but if you need more details you can always dig up its source code.
I believe that you are missing the shared memory values. I don't think ps reports the shared RAM as part of the RSS field. Compare with the top RES field to see.
Of course if you do add in the shared RAM, how much do you add? Because it is shared the same RAM may show up credited to many different processes.
You can try to solve that problem by creative parsing of the /proc/[pid]/smaps files.
But still, that only gets you part of the way. Some memory pages are shared but accounted as resident. These pages get shared after a fork() call. They can become unshared at any time but until they are they don't count toward total used system RAM. The proc smaps file doesn't show these either.

Resources