My embedded system runs Linux 3.10.14.
While running, my application prints out this message.
ERR: Memory overflow! free bytes=56000, bytes used=4040000, bytes to allocate=84000
But when I do "free", it seems I have enough free memory.
/ # free
total used free shared buffers
Mem: 27652 20788 6864 0 0
-/+ buffers: 20788 6864
Swap: 0 0 0
Any possible root cause of the error message?
Or how can I use free memory to the last 1 byte?
Please comment if I am missing any information.
Thank you!
According to the output of "free", we can see totally there are 27652 bytes, 20788 bytes are used and 6864 bytes are free.
The print from your application, it seems to try to allocate 84000 bytes, but there are only 56000 bytes available。
So there is a question that how much memory do your system have? 27652 bytes or
4096000 bytes?
The printout is got from the system ?
Related
I am working on a per process memory monitoring (Bash) script but it turns out to be more of a headache than I thought. Especially on forked processes such as PostgreSQL. There are a couple of reasons:
RSS is a potential value to be used as memory usage, however this also contains shared libraries etc which are used in other processes
PSS is another potential value which (should) show only the private memory of a process. Problem here is that PSS can only be retrieved from /proc//smaps which requires elevated capability privileges (or root)
USS (calculated as Private_Dirty + Private_Clean, source How does smem calculate RSS, USS and PSS?) could also be a potential candidate but here again we need access to /proc//smaps
For now I am trying to solve the forked process problem by looping through each PID's smaps (as suggested in https://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/), for example:
for pid in $(pgrep -a -f "postgres" | awk '{print $1}' | tr "\n" " " ); do grep "^Pss:" /proc/$pid/smaps; done
Maybe some of the postgres processes should be excluded, I am not sure.
Using this method to calculate and sum the PSS and USS values, resulting in:
PSS: 4817 MB - USS: 4547 MB - RES: 6176 MB - VIRT: 26851 MB used
Obviously this only works with elevated privileges, which I would prefer to avoid. If these values actually represent the truth is not known because other tools/commands show yet again different values.
Unfortunately top and htop are unable to combine the postgres processes. atop is able to do this and seems to be (from a feeling) the most accurate with the following values:
NPROCS SYSCPU USRCPU VSIZE RSIZE PSIZE SWAPSZ RDDSK WRDSK RNET SNET MEM CMD 1/1
27 56m50s 16m40s 5.4G 1.1G 0K 2308K 0K 0K 0 0 11% postgres
Now to the question: What is the suggested and best way to retrieve the most accurate memory usage of an application with forked processes, such as PostgreSQL?
And in case atop already does an accurate calculation, how does atop get the to RSIZE value? Note that this value shown as root and non-root user, which would probably mean that /proc/<pid>/smaps is not used for the calculation.
Please comment if more information is needed.
EDIT: I actually found a bug in my pgrep pattern in my final script and it falsely parsed a lot more than just the postgres processes.
The new output now shows the same RES value as seen in atop RSIZE:
Script output:
PSS: 205 MB - USS: 60 MB - RES: 1162 MB - VIRT: 5506 MB
atop summarized postgres output:
NPROCS SYSCPU USRCPU VSIZE RSIZE PSIZE SWAPSZ RDDSK WRDSK RNET SNET MEM CMD
27 0.04s 0.10s 5.4G 1.1G 0K 2308K 0K 32K 0 0 11% postgres
But the question remains of course. Unless I am now using the most accurate way with the summarized RSS (RES) memory value. Let me know your thoughts, thanks :)
I was configuring icinga2 to get memory used information from one linux client using script at check_snmp_mem.pl . Any idea how the memory used is derived in this script ?
Here is free command output
# free
total used free shared buff/cache available
Mem: 500016 59160 89564 3036 351292 408972
Swap: 1048572 4092 1044480
where as the performance data shown in icinga dashboard is
Label Value Max Warning Critical
ram_used 137,700.00 500,016.00 470,015.00 490,016.00
swap_used 4,092.00 1,048,572.00 524,286.00 838,858.00
Looking through the source code, it mentions ram_used for example in this line:
$n_output .= " | ram_used=" . ($$resultat{$nets_ram_total}-$$resultat{$nets_ram_free}-$$resultat{$nets_ram_cache}).";";
This strongly suggests that ram_used is calculated as the difference of the total RAM and the free RAM and the RAM used for cache. These values are retrieved via the following SNMP ids:
my $nets_ram_free = "1.3.6.1.4.1.2021.4.6.0"; # Real memory free
my $nets_ram_total = "1.3.6.1.4.1.2021.4.5.0"; # Real memory total
my $nets_ram_cache = "1.3.6.1.4.1.2021.4.15.0"; # Real memory cached
I don't know how they correlate to the output of free. The difference in free memory reported by free and to Icinga is 48136, so maybe you can find that number somewhere.
On inspecting a crash dump file for an out of memory exception reported by a client the results of !DumpHeap -stat showed that 575MB of memory is being taken up by 45,000 objects of type "Free" most of which I assume would have to reside in Gen 2 due to the size.
The first places I looked for problems were the large object heap (LOH) and pinned objects. The large object heap with free space included was only 70MB so that wasn't the issue and running !gchandles showed:
GC Handle Statistics:
Strong Handles: 155
Pinned Handles: 265
Async Pinned Handles: 8
Ref Count Handles: 163
Weak Long Handles: 0
Weak Short Handles: 0
Other Handles: 0
which is a very small number of handles (around 600) compared to the number of free objects (45,000). To me this rules out the free blocks being caused by pinning.
I also looked into the free blocks themselves to see if maybe they had a consistent size, but on inspection the sizes varied widely and went from just short of 5MB to only around 12 bytes or so.
Any help would be appreciated! I am at a loss since there is fragmentation but no signs of it being cause by the two places that I know to look which are the large object heap (LOH) and pinned handles.
In which generation are the free objects?
I assume would have to reside in Gen 2 due to the size
Size is not related to generations. To find out in which generation the free blocks reside, you can follow these steps:
From !dumpheap -stat -type Free get the method table:
0:003> !dumpheap -stat -type Free
total 7 objects
Statistics:
MT Count TotalSize Class Name
00723538 7 100 Free
Total 7 objects
From !eeheap -gc, get the start addresses of the generations.
0:003> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x026a1018
generation 1 starts at 0x026a100c
generation 2 starts at 0x026a1000
ephemeral segment allocation context: none
segment begin allocated size
026a0000 026a1000 02731ff4 0x00090ff4(593908)
Large object heap starts at 0x036a1000
segment begin allocated size
036a0000 036a1000 036a3250 0x00002250(8784)
Total Size 0x93244(602692)
------------------------------
GC Heap Size 0x93244(602692)
Then dump the free objects only of a specific generation by passing the start and end address (e.g. of generation 2 here):
0:003> !dumpheap -mt 00723538 0x026a1000 0x026a100c
Address MT Size
026a1000 00723538 12 Free
026a100c 00723538 12 Free
total 2 objects
Statistics:
MT Count TotalSize Class Name
00723538 2 24 Free
Total 2 objects
So in my simple case, there are 2 free objects in generation 2.
Pinned handles
600 pinned objects should not cause 45.000 free memory blocks. Still from my experience, 600 pinned handles are a lot. But first, check in which generation the free memory blocks reside.
I have lots of memory allocated in my mono droid program which doesn't appear to directly belong to either the Dalvik or the Mono heaps. Also, I can't figure out how to track .NET memory leaks.
When I call
adb shell dumpsys meminfo MyProgram.Droid
This is the output:
** MEMINFO in pid 1364 [MyProgram.Droid] **
Shared Private Heap Heap Heap
Pss Dirty Dirty Size Alloc Free
------ ------ ------ ------ ------ ------
Native 36 24 36 38080 37775 124
Dalvik 6934 15164 6572 16839 15384 1455
Cursor 0 0 0
Ashmem 0 0 0
Other dev 4 36 0
.so mmap 12029 2416 9068
.jar mmap 0 0 0
.apk mmap 16920 0 0
.ttf mmap 3 0 0
.dex mmap 2299 296 8
Other mmap 64 24 36
Unknown 28920 8216 28728
TOTAL 67209 26176 44448 54919 53159 1579
I assume that the "Unknown" section is the mono framework, including the .NET heap. However when I call
GC.GetTotalMemory(true)
It tells me that only 5Mb of memory is allocated. That leaves 23Mb which I cannot trace (and there's 38Mb of allocated native heap)
Additionally, I don't see that Xamarin have any tools for tracking .NET memory leaks. I've added garbage collection logging with
adb shell setprop debug.mono.log gc,gref
But this is incredibly verbose and hard to read and doesn't even include allocation sizes.
At this point I don't know what to do to track down the resulting leaks. Since the allocations appear to be on the native heap, do I need to use the NDK in order to track down what's going on? Are there any tools I can use on the C# side to track the .NET leaks?
Thanks!
I have the following little Ada program:
procedure Leaky_Main is
task Beer;
task body Beer is
begin
null;
end Beer;
begin
null;
end Leaky_Main;
All fairly basic, but when i compile like this:
gnatmake -g -gnatwI leaky_main.adb
and run it through Valgrind like this :
valgrind --tool=memcheck -v --leak-check=full --read-var-info=yes --leak-check=full --show-reachable=yes ./leaky_main
I get the following error summary:
==2882== 2,104 bytes in 1 blocks are still reachable in loss record 1 of 1
==2882== at 0x4028876: malloc (vg_replace_malloc.c:236)
==2882== by 0x42AD3B8: __gnat_malloc (in /usr/lib/i386-linux-gnu/libgnat-4.4.so.1)
==2882== by 0x40615FF: system__task_primitives__operations__new_atcb (in /usr/lib/i386-linux-gnu/libgnarl-4.4.so.1)
==2882== by 0x406433C: system__tasking__initialize (in /usr/lib/i386-linux-gnu/libgnarl-4.4.so.1)
==2882== by 0x4063C86: system__tasking__initialization__init_rts (in /usr/lib/i386-linux-gnu/libgnarl-4.4.so.1)
==2882== by 0x4063DA6: system__tasking__initialization___elabb (in /usr/lib/i386-linux-gnu/libgnarl-4.4.so.1)
==2882== by 0x8049ADA: adainit (b~leaky_main.adb:142)
==2882== by 0x8049B7C: main (b~leaky_main.adb:189)
==2882==
==2882== LEAK SUMMARY:
==2882== definitely lost: 0 bytes in 0 blocks
==2882== indirectly lost: 0 bytes in 0 blocks
==2882== possibly lost: 0 bytes in 0 blocks
==2882== still reachable: 2,104 bytes in 1 blocks
==2882== suppressed: 0 bytes in 0 blocks
==2882==
==2882== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 21 from 6)
--2882--
--2882-- used_suppression: 21 U1004-ARM-_dl_relocate_object
==2882==
==2882== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 21 from 6)
Does anyone know why this is being reported as an error ?
Im fairly sure there is no actual leak, but i would like to know why/how it happens.
Thanks,
The problematic allocation looks to be the task control block (TCB). This has to be retained after the task has completed so that you can say
if Beer’Terminated then
...
so I think it’s probably an artefact of when valgrind does thecheck.
I’ve only come across this where the task was allocated; it was necessary to wait until ’Terminated was True before deallocating the task, or GNAT happily deallocated the stack but silently didn’t deallocate the TCB, leading to a real leak like yours. AdaCore recently fixed this (I don’t have the reference, it was on their developer log).
You should use deleaker for debugging. I prefer it)