I am trying to reproduc a problem .
My c code giving SIGABRT , i traced it back to this line number :3174
https://elixir.bootlin.com/glibc/glibc-2.27/source/malloc/malloc.c
/* Little security check which won't hurt performance: the allocator
never wrapps around at the end of the address space. Therefore
we can exclude some size values which might appear here by
accident or by "design" from some intruder. We need to bypass
this check for dumped fake mmap chunks from the old main arena
because the new malloc may provide additional alignment. */
if ((__builtin_expect ((uintptr_t) oldp > (uintptr_t) -oldsize, 0)
|| __builtin_expect (misaligned_chunk (oldp), 0))
&& !DUMPED_MAIN_ARENA_CHUNK (oldp))
malloc_printerr ("realloc(): invalid pointer");
My understanding is that when i call calloc function memory get allocated when I call realloc function and try to increase memory area ,heap is not available for some reason giving SIGABRT
My another question is, How can I limit the heap area to some bytes say, 10 bytes to replicate the problem. In stackoverflow RSLIMIT and srlimit is mentioned but no sample code is mentioned. Can you provide sample code where heap size is 10 Bytes ?
How can I limit the heap area to some bytes say, 10 bytes
Can you provide sample code where heap size is 10 Bytes ?
From How to limit heap size for a c code in linux , you could do:
You could use (inside your program) setrlimit(2), probably with RLIMIT_AS (as cited by Ouah's answer).
#include <sys/resource.h>
int main() {
setrlimit(RLIMIT_AS, &(struct rlimit){10,10});
}
Better yet, make your shell do it. With bash it is the ulimit builtin.
$ ulimit -v 10
$ ./your_program.out
to replicate the problem
Most probably, limiting heap size will result in a different problem related to heap size limit. Most probably it is unrelated, and will not help you to debug the problem. Instead, I would suggest to research address sanitizer and valgrind.
When you have a dynamically allocated buffer that varies its size at runtime in unpredictable ways (for example a vector or a string) one way to optimize its allocation is to only resize its backing store on powers of 2 (or some other set of boundaries/thresholds), and leave the extra space unused. This helps to amortize the cost of searching for new free memory and copying the data across, at the expense of a little extra memory use. For example the interface specification (reserve vs resize vs trim) of many C++ stl containers have such a scheme in mind.
My question is does the default implementation of the malloc/realloc/free memory manager on Linux 3.0 x86_64, GLIBC 2.13, GCC 4.6 (Ubuntu 11.10) have such an optimization?
void* p = malloc(N);
... // time passes, stuff happens
void* q = realloc(p,M);
Put another way, for what values of N and M (or in what other circumstances) will p == q?
From the realloc implementation in glibc trunk at http://sources.redhat.com/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=12d2211b0d6603ac27840d6f629071d1c78586fe;hb=HEAD
First, if the memory has been obtained via mmap() instead of sbrk(), which glibc malloc does for large requests, >= 128 kB by default IIRC:
if (chunk_is_mmapped(oldp))
{
void* newmem;
#if HAVE_MREMAP
newp = mremap_chunk(oldp, nb);
if(newp) return chunk2mem(newp);
#endif
/* Note the extra SIZE_SZ overhead. */
if(oldsize - SIZE_SZ >= nb) return oldmem; /* do nothing */
/* Must alloc, copy, free. */
newmem = public_mALLOc(bytes);
if (newmem == 0) return 0; /* propagate failure */
MALLOC_COPY(newmem, oldmem, oldsize - 2*SIZE_SZ);
munmap_chunk(oldp);
return newmem;
}
(Linux has mremap(), so in practice this is what is done).
For smaller requests, a few lines below we have
newp = _int_realloc(ar_ptr, oldp, oldsize, nb);
where _int_realloc is a bit big to copy-paste here, but you'll find it starting at line 4221 in the link above. AFAICS, it does NOT do the constant factor optimization increase that e.g. the C++ std::vector does, but rather allocates exactly the amount requested by the user (rounded up to the next chunk boundaries + alignment stuff and so on).
I suppose the idea is that if the user wants this factor of 2 size increase (or any other constant factor increase in order to guarantee logarithmic efficiency when resizing multiple times), then the user can implement it himself on top of the facility provided by the C library.
Perhaps you can use malloc_usable_size (google for it) to find the answer experimentally. This function, however, seems undocumented, so you will need to check out if it is still available at your platform.
See also How to find how much space is allocated by a call to malloc()?
So I tried to test whether the D garbage collector works properly by running this program on Windows.
DMD 2.057 and 2.058 beta both give the same result, whether or not I specify -release, -inline, -O, etc.
The code:
import core.memory, std.stdio;
extern(Windows) int GlobalMemoryStatusEx(ref MEMORYSTATUSEX lpBuffer);
struct MEMORYSTATUSEX
{
uint Length, MemoryLoad;
ulong TotalPhys, AvailPhys, TotalPageFile, AvailPageFile;
ulong TotalVirtual, AvailVirtual, AvailExtendedVirtual;
}
void testA(size_t count)
{
size_t[] a;
foreach (i; 0 .. count)
a ~= i;
//delete a;
}
void main()
{
MEMORYSTATUSEX ms;
ms.Length = ms.sizeof;
foreach (i; 0 .. 32)
{
testA(16 << 20);
GlobalMemoryStatusEx(ms);
stderr.writefln("AvailPhys: %s MiB", ms.AvailPhys >>> 20);
}
}
The output was:
AvailPhys: 3711 MiB
AvailPhys: 3365 MiB
AvailPhys: 3061 MiB
AvailPhys: 2747 MiB
AvailPhys: 2458 MiB
core.exception.OutOfMemoryError
When I uncommented the delete a; statement, the output was
AvailPhys: 3714 MiB
AvailPhys: 3702 MiB
AvailPhys: 3701 MiB
AvailPhys: 3702 MiB
AvailPhys: 3702 MiB
...
So I guess the question is obvious... does the GC actually work?
The problem here is false pointers. D's garbage collector is conservative, meaning it doesn't always know what's a pointer and what isn't. It sometimes has to assume that bit patterns that would point into GC-allocated memory if interpreted as pointers, are pointers. This is mostly a problem for large allocations, since large blocks are a bigger target for false pointers.
You're allocating about 48 MB each time you call testA(). In my experience this is enough to almost guarantee there will be a false pointer into the block on a 32-bit system. You'll probably get better results if you compile your code in 64-bit mode (supported on Linux, OSX and FreeBSD but not Windows yet) since 64-bit address space is much more sparse.
As far as my GC optimizations (I'm the David Simcha that CyberShadow mentions) there were two batches. One's been in for >6 months and hasn't caused any problems. The other is still in review as a pull request and isn't in the main druntime tree yet. These probably aren't the problem.
Short term, the solution is to manually free these huge blocks. Long term, we need to add precise scanning, at least for the heap. (Precise stack scanning is a much harder problem.) I wrote a patch to do this a couple years ago but it was rejected because it relied on templates and compile time function evaluation to generate pointer offset information for each datatype. Hopefully this information will eventually be generated directly by the compiler and I can re-create my precise heap scanning patch for the garbage collector.
This looks like a regression - it doesn't happen in D1 (DMD 1.069). David Simcha has been optimizing the GC lately, so it might have something to do with that. Please file a bug report.
P.S. If you rebuild Druntime with DFLAGS set to -debug=PRINTF in the makefile, you will get information on when the GC allocates/deallocates via console. :)
It does work. Current implementation just never releases memory to operating system. Though GC reuses acquired memory, so it's not a really a leak.
What are RSS and VSZ in Linux memory management? In a multithreaded environment how can both of these can be managed and tracked?
RSS is the Resident Set Size and is used to show how much memory is allocated to that process and is in RAM. It does not include memory that is swapped out. It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.
VSZ is the Virtual Memory Size. It includes all memory that the process can access, including memory that is swapped out, memory that is allocated, but not used, and memory that is from shared libraries.
So if process A has a 500K binary and is linked to 2500K of shared libraries, has 200K of stack/heap allocations of which 100K is actually in memory (rest is swapped or unused), and it has only actually loaded 1000K of the shared libraries and 400K of its own binary then:
RSS: 400K + 1000K + 100K = 1500K
VSZ: 500K + 2500K + 200K = 3200K
Since part of the memory is shared, many processes may use it, so if you add up all of the RSS values you can easily end up with more space than your system has.
The memory that is allocated also may not be in RSS until it is actually used by the program. So if your program allocated a bunch of memory up front, then uses it over time, you could see RSS going up and VSZ staying the same.
There is also PSS (proportional set size). This is a newer measure which tracks the shared memory as a proportion used by the current process. So if there were two processes using the same shared library from before:
PSS: 400K + (1000K/2) + 100K = 400K + 500K + 100K = 1000K
Threads all share the same address space, so the RSS, VSZ and PSS for each thread is identical to all of the other threads in the process. Use ps or top to view this information in linux/unix.
There is way more to it than this, to learn more check the following references:
http://manpages.ubuntu.com/manpages/en/man1/ps.1.html
https://web.archive.org/web/20120520221529/http://emilics.com/blog/article/mconsumption.html
Also see:
A way to determine a process's "real" memory usage, i.e. private dirty RSS?
RSS is Resident Set Size (physically resident memory - this is currently occupying space in the machine's physical memory), and VSZ is Virtual Memory Size (address space allocated - this has addresses allocated in the process's memory map, but there isn't necessarily any actual memory behind it all right now).
Note that in these days of commonplace virtual machines, physical memory from the machine's view point may not really be actual physical memory.
Minimal runnable example
For this to make sense, you have to understand the basics of paging: How does x86 paging work? and in particular that the OS can allocate virtual memory via page tables / its internal memory book keeping (VSZ virtual memory) before it actually has a backing storage on RAM or disk (RSS resident memory).
Now to observe this in action, let's create a program that:
allocates more RAM than our physical memory with mmap
writes one byte on each page to ensure that each of those pages goes from virtual only memory (VSZ) to actually used memory (RSS)
checks the memory usage of the process with one of the methods mentioned at: Memory usage of current process in C
main.c
#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
typedef struct {
unsigned long size,resident,share,text,lib,data,dt;
} ProcStatm;
/* https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c/7212248#7212248 */
void ProcStat_init(ProcStatm *result) {
const char* statm_path = "/proc/self/statm";
FILE *f = fopen(statm_path, "r");
if(!f) {
perror(statm_path);
abort();
}
if(7 != fscanf(
f,
"%lu %lu %lu %lu %lu %lu %lu",
&(result->size),
&(result->resident),
&(result->share),
&(result->text),
&(result->lib),
&(result->data),
&(result->dt)
)) {
perror(statm_path);
abort();
}
fclose(f);
}
int main(int argc, char **argv) {
ProcStatm proc_statm;
char *base, *p;
char system_cmd[1024];
long page_size;
size_t i, nbytes, print_interval, bytes_since_last_print;
int snprintf_return;
/* Decide how many ints to allocate. */
if (argc < 2) {
nbytes = 0x10000;
} else {
nbytes = strtoull(argv[1], NULL, 0);
}
if (argc < 3) {
print_interval = 0x1000;
} else {
print_interval = strtoull(argv[2], NULL, 0);
}
page_size = sysconf(_SC_PAGESIZE);
/* Allocate the memory. */
base = mmap(
NULL,
nbytes,
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS,
-1,
0
);
if (base == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
/* Write to all the allocated pages. */
i = 0;
p = base;
bytes_since_last_print = 0;
/* Produce the ps command that lists only our VSZ and RSS. */
snprintf_return = snprintf(
system_cmd,
sizeof(system_cmd),
"ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == \"%ju\") print}'",
(uintmax_t)getpid()
);
assert(snprintf_return >= 0);
assert((size_t)snprintf_return < sizeof(system_cmd));
bytes_since_last_print = print_interval;
do {
/* Modify a byte in the page. */
*p = i;
p += page_size;
bytes_since_last_print += page_size;
/* Print process memory usage every print_interval bytes.
* We count memory using a few techniques from:
* https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c */
if (bytes_since_last_print > print_interval) {
bytes_since_last_print -= print_interval;
printf("extra_memory_committed %lu KiB\n", (i * page_size) / 1024);
ProcStat_init(&proc_statm);
/* Check /proc/self/statm */
printf(
"/proc/self/statm size resident %lu %lu KiB\n",
(proc_statm.size * page_size) / 1024,
(proc_statm.resident * page_size) / 1024
);
/* Check ps. */
puts(system_cmd);
system(system_cmd);
puts("");
}
i++;
} while (p < base + nbytes);
/* Cleanup. */
munmap(base, nbytes);
return EXIT_SUCCESS;
}
GitHub upstream.
Compile and run:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
sudo dmesg -c
./main.out 0x1000000000 0x200000000
echo $?
sudo dmesg
where:
0x1000000000 == 64GiB: 2x my computer's physical RAM of 32GiB
0x200000000 == 8GiB: print the memory every 8GiB, so we should get 4 prints before the crash at around 32GiB
echo 1 | sudo tee /proc/sys/vm/overcommit_memory: required for Linux to allow us to make a mmap call larger than physical RAM: Maximum memory which malloc can allocate
Program output:
extra_memory_committed 0 KiB
/proc/self/statm size resident 67111332 768 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 1648
extra_memory_committed 8388608 KiB
/proc/self/statm size resident 67111332 8390244 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 8390256
extra_memory_committed 16777216 KiB
/proc/self/statm size resident 67111332 16778852 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 16778864
extra_memory_committed 25165824 KiB
/proc/self/statm size resident 67111332 25167460 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 25167472
Killed
Exit status:
137
which by the 128 + signal number rule means we got signal number 9, which man 7 signal says is SIGKILL, which is sent by the Linux out-of-memory killer.
Output interpretation:
VSZ virtual memory remains constant at printf '0x%X\n' 0x40009A4 KiB ~= 64GiB (ps values are in KiB) after the mmap.
RSS "real memory usage" increases lazily only as we touch the pages. For example:
on the first print, we have extra_memory_committed 0, which means we haven't yet touched any pages. RSS is a small 1648 KiB which has been allocated for normal program startup like text area, globals, etc.
on the second print, we have written to 8388608 KiB == 8GiB worth of pages. As a result, RSS increased by exactly 8GIB to 8390256 KiB == 8388608 KiB + 1648 KiB
RSS continues to increase in 8GiB increments. The last print shows about 24 GiB of memory, and before 32 GiB could be printed, the OOM killer killed the process
See also: https://unix.stackexchange.com/questions/35129/need-explanation-on-resident-set-size-virtual-size
OOM killer logs
Our dmesg commands have shown the OOM killer logs.
An exact interpretation of those has been asked at:
Understanding the Linux oom-killer's logs but let's have a quick look here.
https://serverfault.com/questions/548736/how-to-read-oom-killer-syslog-messages
The very first line of the log was:
[ 7283.479087] mongod invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
So we see that interestingly it was the MongoDB daemon that always runs in my laptop on the background that first triggered the OOM killer, presumably when the poor thing was trying to allocate some memory.
However, the OOM killer does not necessarily kill the one who awoke it.
After the invocation, the kernel prints a table or processes including the oom_score:
[ 7283.479292] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 7283.479303] [ 496] 0 496 16126 6 172032 484 0 systemd-journal
[ 7283.479306] [ 505] 0 505 1309 0 45056 52 0 blkmapd
[ 7283.479309] [ 513] 0 513 19757 0 57344 55 0 lvmetad
[ 7283.479312] [ 516] 0 516 4681 1 61440 444 -1000 systemd-udevd
and further ahead we see that our own little main.out actually got killed on the previous invocation:
[ 7283.479871] Out of memory: Kill process 15665 (main.out) score 865 or sacrifice child
[ 7283.479879] Killed process 15665 (main.out) total-vm:67111332kB, anon-rss:92kB, file-rss:4kB, shmem-rss:30080832kB
[ 7283.479951] oom_reaper: reaped process 15665 (main.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:30080832kB
This log mentions the score 865 which that process had, presumably the highest (worst) OOM killer score as mentioned at: https://unix.stackexchange.com/questions/153585/how-does-the-oom-killer-decide-which-process-to-kill-first
Also interestingly, everything apparently happened so fast that before the freed memory was accounted, the oom was awoken again by the DeadlineMonitor process:
[ 7283.481043] DeadlineMonitor invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
and this time that killed some Chromium process, which is usually my computers normal memory hog:
[ 7283.481773] Out of memory: Kill process 11786 (chromium-browse) score 306 or sacrifice child
[ 7283.481833] Killed process 11786 (chromium-browse) total-vm:1813576kB, anon-rss:208804kB, file-rss:0kB, shmem-rss:8380kB
[ 7283.497847] oom_reaper: reaped process 11786 (chromium-browse), now anon-rss:0kB, file-rss:0kB, shmem-rss:8044kB
Tested in Ubuntu 19.04, Linux kernel 5.0.0.
Linux kernel docs
https://github.com/torvalds/linux/blob/v5.17/Documentation/filesystems/proc.rst has some points. The term "VSZ" is not used there but "RSS" is, and there's nothing too enlightening (surprise?!)
Instead of VSZ, the kernel seems to use the term VmSize, which appears e.g. on /proc/$PID/status.
Some quotes of interest:
The first of these lines shows the same information as is displayed for the mapping in /proc/PID/maps. Following lines show the size of the mapping (size); the size of each page allocated when backing a VMA (KernelPageSize), which is usually the same as the size in the page table entries; the page size used by the MMU when backing a VMA (in most cases, the same as KernelPageSize); the amount of the mapping that is currently resident in RAM (RSS); the process' proportional share of this mapping (PSS); and the number of clean and dirty shared and private pages in the mapping.
The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. So if a process has 1000 pages all to itself, and 1000 shared with one other process, its PSS will be 1500.
Note that even a page which is part of a MAP_SHARED mapping, but has only a single pte mapped, i.e. is currently used by only one process, is accounted as private and not as shared.
So we can guess a few more things:
shared libraries used by a single process appear in RSS, if more than one process has them then not
PSS was mentioned by jmh, and has a more proportional approach between "I'm the only process that holds the shared library" and "there are N process holding the shared library, so each one holds memory/N on average"
VSZ - Virtual Set Size
The Virtual Set Size is a memory size assigned to a process ( program ) during the initial execution. The Virtual Set Size memory is simply a number of how much memory a process has available for its execution.
RSS - Resident Set Size (kinda RAM)
As opposed to VSZ ( Virtual Set Size ), RSS is a memory currently used by a process. This is a actual number in kilobytes of how much RAM the current process is using.
Source
I think much has already been said, about RSS vs VSZ. From an administrator/programmer/user perspective, when I design/code applications I am more concerned about the RSZ, (Resident memory), as and when you keep pulling more and more variables (heaped) you will see this value shooting up. Try a simple program to build malloc based space allocation in loop, and make sure you fill data in that malloc'd space. RSS keeps moving up.
As far as VSZ is concerned, it's more of virtual memory mapping that linux does, and one of its core features derived out of conventional operating system concepts. The VSZ management is done by Virtual memory management of the kernel, for more info on VSZ, see Robert Love's description on mm_struct and vm_struct, which are part of basic task_struct data structure in kernel.
To summarize #jmh excellent answer :
In #linux, a process's memory comprises :
its own binary
its shared libs
its stack and heap
Due to paging, not all of those are always fully in memory, only the useful, most recently used parts (pages) are. Other parts are paged out (or swapped out) to make room for other processes.
The table below, taken from #jmh's answer, shows an example of what Resident and Virtual memory are for a specific process.
+-------------+-------------------------+------------------------+
| portion | actually in memory | total (allocated) size |
|-------------+-------------------------+------------------------|
| binary | 400K | 500K |
| shared libs | 1000K | 2500K |
| stack+heap | 100K | 200K |
|-------------+-------------------------+------------------------|
| | RSS (Resident Set Size) | VSZ (Virtual Set Size) |
|-------------+-------------------------+------------------------|
| | 1500K | 3200K |
+-------------+-------------------------+------------------------+
To summarize : resident memory is what is actually in physical memory right now, and virtual size is the total, necessary physical memory to load all components.
Of course, numbers don't add up, because libraries are shared between multiple processes and their memory is counted for each process separately, even if they are loaded only once.
They are not managed, but measured and possibly limited (see getrlimit system call, also on getrlimit(2)).
RSS means resident set size (the part of your virtual address space sitting in RAM).
You can query the virtual address space of process 1234 using proc(5) with cat /proc/1234/maps and its status (including memory consumption) thru cat /proc/1234/status
Pertaining to the Linux kernel, do "Kernel" pages ever get swapped out ? Also, do User space pages ever get to reside in ZONE_NORMAL ?
No, kernel memory is unswappable.
Kernel pages are not swappable. But it can be freed.
UserSpace Pages can reside in ZONE_NORMAL.
Linux System Can be configured either to use HIGHMEM or not.
If ZONE_HIGHMEM is configured , then the userspace processes will get its memory from the HIGHMEM else userspace processes will get memory from ZONE_NORMAL.
Yes, under normal circumstances kernel pages (ie, memory residing in the kernel for kernel usage) are not swappable, in fact, once detected (see the pagefault handler source code), the kernel will explicitly crash itself.
See this:
http://lxr.free-electrons.com/source/arch/x86/mm/fault.c
and the function:
1205 /*
1206 * This routine handles page faults. It determines the address,
1207 * and the problem, and then passes it off to one of the appropriate
1208 * routines.
1209 *
1210 * This function must have noinline because both callers
1211 * {,trace_}do_page_fault() have notrace on. Having this an actual function
1212 * guarantees there's a function trace entry.
1213 */
1214 static noinline void
1215 __do_page_fault(struct pt_regs *regs, unsigned long error_code,
1216 unsigned long address)
1217 {
And the detection here:
1246 *
1247 * This verifies that the fault happens in kernel space
1248 * (error_code & 4) == 0, and that the fault was not a
1249 * protection error (error_code & 9) == 0.
1250 */
1251 if (unlikely(fault_in_kernel_space(address))) {
1252 if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) {
1253 if (vmalloc_fault(address) >= 0)
1254 return;
1255
1256 if (kmemcheck_fault(regs, address, error_code))
1257 return;
1258 }
But the same pagefault handler - which can detect pagefault arising from non-existent usermode memory (all hardware pagefault detection is always done in kernel) will explicitly retrieve the data from swap space if it exists, or start a memory allocation routine to give the process more memory.
Ok, that said, kernel does swap out kernel structures/memory/tasklists etc during software suspend and hibernation operation:
https://www.kernel.org/doc/Documentation/power/swsusp.txt
And during the resume phase it will restore back the kernel memory from swap file.