versions we are using
NodeOracleDb Version: 4.1.0 / 5.1.0/ 5.4.0
platform linux
version v18.1.0
arch x64
version 5.4.0
oracleclient 19.8.0.0.0 oracle
DB: 19.0.0.0.ru-2021-10.rur-2021-10.r1
Detailed Description
When we use a connection pool(max 30 min 10) and if we are fetching records from a table which has more than 250+ columns where few are clobs. The memory keeps on increasing and this memory increase is in RSS. This memory never comes down and the app crashes with OOM after a certain time. We were initially using the loopback 3 framework with oracle client, but we were able to simulate the same using the default connectionPool code given in the sample files (https://github.com/oracle/node-oracledb/blob/main/examples/connectionpool.js). What we were also able to find out was this was coming mainly when we are using connection pool.
Include a runnable Node.js script that shows the problem.
https://github.com/oracle/node-oracledb/files/9132973/oracleLeak.zip
This has insert script and create table scripts. Please run them first and then modify sample.js for DB connection.
We tested this with min pool as 10, max pool as 30 and UV_THREADPOOL_SIZE as 30
We even tried valgrind with 5.4.0 and we found that the issue is visible with definitely lost.
==36361== 903,607 (32,256 direct, 871,351 indirect) bytes in 1 blocks are definitely lost in loss record 4,608 of 4,615
==36361== at 0x4C03A83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==36361== by 0x1181F179: njsConnection_executeAsync (in /home/ubuntu/hkfps/oepy-outbound-payment-all/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node)
==36361== by 0x11819C24: njsBaton_executeAsync (in /home/ubuntu/hkfps/oepy-outbound-payment-all/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node)
==36361== by 0x163C973: worker (threadpool.c:122)
==36361== by 0x58B2B42: start_thread (pthread_create.c:442)
==36361== by 0x5943BB3: clone (clone.S:100)
=36361== LEAK SUMMARY:
==36361== definitely lost: 32,256 bytes in 2 blocks
==36361== indirectly lost: 871,351 bytes in 975 blocks
==36361== possibly lost: 2,500,246 bytes in 1,397 blocks
==36361== still reachable: 485,525,664 bytes in 30,537 blocks
==36361== of which reachable via heuristic:
==36361== multipleinheritance: 48 bytes in 1 blocks
==36361== suppressed: 0 bytes in 0 blocks
Valgrind command which was used
valgrind --track-origins=yes --leak-check=full --trace-children=yes --log-file=out_%p.log --demangle=yes --error-limit=no --show-leak-kinds=all node test5.js
My embedded system runs Linux 3.10.14.
While running, my application prints out this message.
ERR: Memory overflow! free bytes=56000, bytes used=4040000, bytes to allocate=84000
But when I do "free", it seems I have enough free memory.
/ # free
total used free shared buffers
Mem: 27652 20788 6864 0 0
-/+ buffers: 20788 6864
Swap: 0 0 0
Any possible root cause of the error message?
Or how can I use free memory to the last 1 byte?
Please comment if I am missing any information.
Thank you!
According to the output of "free", we can see totally there are 27652 bytes, 20788 bytes are used and 6864 bytes are free.
The print from your application, it seems to try to allocate 84000 bytes, but there are only 56000 bytes available。
So there is a question that how much memory do your system have? 27652 bytes or
4096000 bytes?
The printout is got from the system ?
The manual page told me so much and through it I know lots of the background knowledge of memory management of "glibc".
But I still get confused. does "malloc_trim(0)"(note zero as the parameter) mean (1.)all the memory in the "heap" section will be returned to OS ? Or (2.)just all "unused" memory of the top most region of the heap will be returned to OS ?
If the answer is (1.), what if the still used memory in the heap? if the heap has used momery at places somewhere, will them be eliminated, or the function wouldn't execute successfully?
While if the answer is (2.), what about those "holes" at places rather than top in the heap? they're unused memory anymore, but the top most region of the heap is still used, will this calling work efficiently?
Thanks.
Man page of malloc_trim was committed here: https://github.com/mkerrisk/man-pages/blob/master/man3/malloc_trim.3 and as I understand, it was written by man-pages project maintainer, kerrisk in 2012 from scratch: https://github.com/mkerrisk/man-pages/commit/a15b0e60b297e29c825b7417582a33e6ca26bf65
As I can grep the glibc's git, there are no man pages in the glibc, and no commit to malloc_trim manpage to document this patch. The best and the only documentation of glibc malloc is its source code: https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c
There are malloc_trim comments from malloc/malloc.c:
Additional functions:
malloc_trim(size_t pad);
609 /*
610 malloc_trim(size_t pad);
611
612 If possible, gives memory back to the system (via negative
613 arguments to sbrk) if there is unused memory at the `high' end of
614 the malloc pool. You can call this after freeing large blocks of
615 memory to potentially reduce the system-level memory requirements
616 of a program. However, it cannot guarantee to reduce memory. Under
617 some allocation patterns, some large free blocks of memory will be
618 locked between two used chunks, so they cannot be given back to
619 the system.
620
621 The `pad' argument to malloc_trim represents the amount of free
622 trailing space to leave untrimmed. If this argument is zero,
623 only the minimum amount of memory to maintain internal data
624 structures will be left (one page or less). Non-zero arguments
625 can be supplied to maintain enough trailing space to service
626 future expected allocations without having to re-obtain memory
627 from the system.
628
629 Malloc_trim returns 1 if it actually released any memory, else 0.
630 On systems that do not support "negative sbrks", it will always
631 return 0.
632 */
633 int __malloc_trim(size_t);
634
Freeing from the middle of the chunk is not documented as text in malloc/malloc.c and not documented in man-pages project. Man page from 2012 may be the first man page of the function, written not by authors of glibc. Info page of glibc only mentions M_TRIM_THRESHOLD of 128 KB:
https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html#Malloc-Tunable-Parameters and don't list malloc_trim function https://www.gnu.org/software/libc/manual/html_node/Summary-of-Malloc.html#Summary-of-Malloc (and it also don't document memusage/memusagestat/libmemusage.so).
In december 2007 there was commit https://sourceware.org/git/?p=glibc.git;a=commit;f=malloc/malloc.c;h=68631c8eb92ff38d9da1ae34f6aa048539b199cc by Ulrich Drepper (it is part of glibc 2.9 and newer) which changed mtrim implementation (but it didn't change any documentation or man page as there are no man pages in glibc):
malloc/malloc.c (public_mTRIm): Iterate over all arenas and call
mTRIm for all of them.
(mTRIm): Additionally iterate over all free blocks and use madvise
to free memory for all those blocks which contain at least one
memory page.
Unused parts of chunks (anywhere, including chunks in the middle), aligned on page size and having size more than page may be marked as MADV_DONTNEED https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=malloc/malloc.c;h=c54c203cbf1f024e72493546221305b4fd5729b7;hp=1e716089a2b976d120c304ad75dd95c63737ad75;hb=68631c8eb92ff38d9da1ae34f6aa048539b199cc;hpb=52386be756e113f20502f181d780aecc38cbb66a
INTERNAL_SIZE_T size = chunksize (p);
if (size > psm1 + sizeof (struct malloc_chunk))
{
/* See whether the chunk contains at least one unused page. */
char *paligned_mem = (char *) (((uintptr_t) p
+ sizeof (struct malloc_chunk)
+ psm1) & ~psm1);
assert ((char *) chunk2mem (p) + 4 * SIZE_SZ <= paligned_mem);
assert ((char *) p + size > paligned_mem);
/* This is the size we could potentially free. */
size -= paligned_mem - (char *) p;
if (size > psm1)
madvise (paligned_mem, size & ~psm1, MADV_DONTNEED);
}
This is one of total two usages of madvise with MADV_DONTNEED in glibc now, one for top part of heaps (shrink_heap) and other is marking of any chunk (mtrim): http://code.metager.de/source/search?q=MADV_DONTNEED&path=%2Fgnu%2Fglibc%2Fmalloc%2F&project=gnu
H A D arena.c 643 __madvise ((char *) h + new_size, diff, MADV_DONTNEED);
H A D malloc.c 4535 __madvise (paligned_mem, size & ~psm1, MADV_DONTNEED);
We can test the malloc_trim with this simple C program (test_malloc_trim.c) and strace/ltrace:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <malloc.h>
int main()
{
int *m1,*m2,*m3,*m4;
printf("%s\n","Test started");
m1=(int*)malloc(20000);
m2=(int*)malloc(40000);
m3=(int*)malloc(80000);
m4=(int*)malloc(10000);
// check that all arrays are allocated on the heap and not with mmap
printf("1:%p 2:%p 3:%p 4:%p\n", m1, m2, m3, m4);
// free 40000 bytes in the middle
free(m2);
// call trim (same result with 2000 or 2000000 argument)
malloc_trim(0);
// call some syscall to find this point in the strace output
sleep(1);
free(m1);
free(m3);
free(m4);
// malloc_stats(); malloc_info(0, stdout);
return 0;
}
gcc test_malloc_trim.c -o test_malloc_trim, strace ./test_malloc_trim
write(1, "Test started\n", 13Test started
) = 13
brk(0) = 0xcca000
brk(0xcef000) = 0xcef000
write(1, "1:0xcca010 2:0xccee40 3:0xcd8a90"..., 441:0xcca010 2:0xccee40 3:0xcd8a90 4:0xcec320
) = 44
madvise(0xccf000, 36864, MADV_DONTNEED) = 0
...
nanosleep({1, 0}, 0x7ffffafbfff0) = 0
brk(0xceb000) = 0xceb000
So, there was madvise with MADV_DONTNEED for 9 pages after malloc_trim(0) call, when there was hole of 40008 bytes in the middle of the heap.
The man page for malloc_trim says it releases free memory, so if there is allocated memory in the heap, it won't release the whole heap. The parameter is there if you know you're still going to need a certain amount of memory, so freeing more than that would cause glibc to have to do unnecessary work later.
As for holes, this is a standard problem with memory management and returning memory to the OS. The primary low-level heap management available to the program is brk and sbrk, and all they can do is extend or shrink the heap area by changing the top. So there's no way for them to return holes to the operating system; once the program has called sbrk to allocate more heap, that space can only be returned if the top of that space is free and can be handed back.
Note that there are other, more complex ways to allocate memory (with anonymous mmap, for example), which may have different constraints than sbrk-based allocation.
I have the following little Ada program:
procedure Leaky_Main is
task Beer;
task body Beer is
begin
null;
end Beer;
begin
null;
end Leaky_Main;
All fairly basic, but when i compile like this:
gnatmake -g -gnatwI leaky_main.adb
and run it through Valgrind like this :
valgrind --tool=memcheck -v --leak-check=full --read-var-info=yes --leak-check=full --show-reachable=yes ./leaky_main
I get the following error summary:
==2882== 2,104 bytes in 1 blocks are still reachable in loss record 1 of 1
==2882== at 0x4028876: malloc (vg_replace_malloc.c:236)
==2882== by 0x42AD3B8: __gnat_malloc (in /usr/lib/i386-linux-gnu/libgnat-4.4.so.1)
==2882== by 0x40615FF: system__task_primitives__operations__new_atcb (in /usr/lib/i386-linux-gnu/libgnarl-4.4.so.1)
==2882== by 0x406433C: system__tasking__initialize (in /usr/lib/i386-linux-gnu/libgnarl-4.4.so.1)
==2882== by 0x4063C86: system__tasking__initialization__init_rts (in /usr/lib/i386-linux-gnu/libgnarl-4.4.so.1)
==2882== by 0x4063DA6: system__tasking__initialization___elabb (in /usr/lib/i386-linux-gnu/libgnarl-4.4.so.1)
==2882== by 0x8049ADA: adainit (b~leaky_main.adb:142)
==2882== by 0x8049B7C: main (b~leaky_main.adb:189)
==2882==
==2882== LEAK SUMMARY:
==2882== definitely lost: 0 bytes in 0 blocks
==2882== indirectly lost: 0 bytes in 0 blocks
==2882== possibly lost: 0 bytes in 0 blocks
==2882== still reachable: 2,104 bytes in 1 blocks
==2882== suppressed: 0 bytes in 0 blocks
==2882==
==2882== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 21 from 6)
--2882--
--2882-- used_suppression: 21 U1004-ARM-_dl_relocate_object
==2882==
==2882== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 21 from 6)
Does anyone know why this is being reported as an error ?
Im fairly sure there is no actual leak, but i would like to know why/how it happens.
Thanks,
The problematic allocation looks to be the task control block (TCB). This has to be retained after the task has completed so that you can say
if Beer’Terminated then
...
so I think it’s probably an artefact of when valgrind does thecheck.
I’ve only come across this where the task was allocated; it was necessary to wait until ’Terminated was True before deallocating the task, or GNAT happily deallocated the stack but silently didn’t deallocate the TCB, leading to a real leak like yours. AdaCore recently fixed this (I don’t have the reference, it was on their developer log).
You should use deleaker for debugging. I prefer it)
What are RSS and VSZ in Linux memory management? In a multithreaded environment how can both of these can be managed and tracked?
RSS is the Resident Set Size and is used to show how much memory is allocated to that process and is in RAM. It does not include memory that is swapped out. It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.
VSZ is the Virtual Memory Size. It includes all memory that the process can access, including memory that is swapped out, memory that is allocated, but not used, and memory that is from shared libraries.
So if process A has a 500K binary and is linked to 2500K of shared libraries, has 200K of stack/heap allocations of which 100K is actually in memory (rest is swapped or unused), and it has only actually loaded 1000K of the shared libraries and 400K of its own binary then:
RSS: 400K + 1000K + 100K = 1500K
VSZ: 500K + 2500K + 200K = 3200K
Since part of the memory is shared, many processes may use it, so if you add up all of the RSS values you can easily end up with more space than your system has.
The memory that is allocated also may not be in RSS until it is actually used by the program. So if your program allocated a bunch of memory up front, then uses it over time, you could see RSS going up and VSZ staying the same.
There is also PSS (proportional set size). This is a newer measure which tracks the shared memory as a proportion used by the current process. So if there were two processes using the same shared library from before:
PSS: 400K + (1000K/2) + 100K = 400K + 500K + 100K = 1000K
Threads all share the same address space, so the RSS, VSZ and PSS for each thread is identical to all of the other threads in the process. Use ps or top to view this information in linux/unix.
There is way more to it than this, to learn more check the following references:
http://manpages.ubuntu.com/manpages/en/man1/ps.1.html
https://web.archive.org/web/20120520221529/http://emilics.com/blog/article/mconsumption.html
Also see:
A way to determine a process's "real" memory usage, i.e. private dirty RSS?
RSS is Resident Set Size (physically resident memory - this is currently occupying space in the machine's physical memory), and VSZ is Virtual Memory Size (address space allocated - this has addresses allocated in the process's memory map, but there isn't necessarily any actual memory behind it all right now).
Note that in these days of commonplace virtual machines, physical memory from the machine's view point may not really be actual physical memory.
Minimal runnable example
For this to make sense, you have to understand the basics of paging: How does x86 paging work? and in particular that the OS can allocate virtual memory via page tables / its internal memory book keeping (VSZ virtual memory) before it actually has a backing storage on RAM or disk (RSS resident memory).
Now to observe this in action, let's create a program that:
allocates more RAM than our physical memory with mmap
writes one byte on each page to ensure that each of those pages goes from virtual only memory (VSZ) to actually used memory (RSS)
checks the memory usage of the process with one of the methods mentioned at: Memory usage of current process in C
main.c
#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
typedef struct {
unsigned long size,resident,share,text,lib,data,dt;
} ProcStatm;
/* https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c/7212248#7212248 */
void ProcStat_init(ProcStatm *result) {
const char* statm_path = "/proc/self/statm";
FILE *f = fopen(statm_path, "r");
if(!f) {
perror(statm_path);
abort();
}
if(7 != fscanf(
f,
"%lu %lu %lu %lu %lu %lu %lu",
&(result->size),
&(result->resident),
&(result->share),
&(result->text),
&(result->lib),
&(result->data),
&(result->dt)
)) {
perror(statm_path);
abort();
}
fclose(f);
}
int main(int argc, char **argv) {
ProcStatm proc_statm;
char *base, *p;
char system_cmd[1024];
long page_size;
size_t i, nbytes, print_interval, bytes_since_last_print;
int snprintf_return;
/* Decide how many ints to allocate. */
if (argc < 2) {
nbytes = 0x10000;
} else {
nbytes = strtoull(argv[1], NULL, 0);
}
if (argc < 3) {
print_interval = 0x1000;
} else {
print_interval = strtoull(argv[2], NULL, 0);
}
page_size = sysconf(_SC_PAGESIZE);
/* Allocate the memory. */
base = mmap(
NULL,
nbytes,
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS,
-1,
0
);
if (base == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
/* Write to all the allocated pages. */
i = 0;
p = base;
bytes_since_last_print = 0;
/* Produce the ps command that lists only our VSZ and RSS. */
snprintf_return = snprintf(
system_cmd,
sizeof(system_cmd),
"ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == \"%ju\") print}'",
(uintmax_t)getpid()
);
assert(snprintf_return >= 0);
assert((size_t)snprintf_return < sizeof(system_cmd));
bytes_since_last_print = print_interval;
do {
/* Modify a byte in the page. */
*p = i;
p += page_size;
bytes_since_last_print += page_size;
/* Print process memory usage every print_interval bytes.
* We count memory using a few techniques from:
* https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c */
if (bytes_since_last_print > print_interval) {
bytes_since_last_print -= print_interval;
printf("extra_memory_committed %lu KiB\n", (i * page_size) / 1024);
ProcStat_init(&proc_statm);
/* Check /proc/self/statm */
printf(
"/proc/self/statm size resident %lu %lu KiB\n",
(proc_statm.size * page_size) / 1024,
(proc_statm.resident * page_size) / 1024
);
/* Check ps. */
puts(system_cmd);
system(system_cmd);
puts("");
}
i++;
} while (p < base + nbytes);
/* Cleanup. */
munmap(base, nbytes);
return EXIT_SUCCESS;
}
GitHub upstream.
Compile and run:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
sudo dmesg -c
./main.out 0x1000000000 0x200000000
echo $?
sudo dmesg
where:
0x1000000000 == 64GiB: 2x my computer's physical RAM of 32GiB
0x200000000 == 8GiB: print the memory every 8GiB, so we should get 4 prints before the crash at around 32GiB
echo 1 | sudo tee /proc/sys/vm/overcommit_memory: required for Linux to allow us to make a mmap call larger than physical RAM: Maximum memory which malloc can allocate
Program output:
extra_memory_committed 0 KiB
/proc/self/statm size resident 67111332 768 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 1648
extra_memory_committed 8388608 KiB
/proc/self/statm size resident 67111332 8390244 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 8390256
extra_memory_committed 16777216 KiB
/proc/self/statm size resident 67111332 16778852 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 16778864
extra_memory_committed 25165824 KiB
/proc/self/statm size resident 67111332 25167460 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 25167472
Killed
Exit status:
137
which by the 128 + signal number rule means we got signal number 9, which man 7 signal says is SIGKILL, which is sent by the Linux out-of-memory killer.
Output interpretation:
VSZ virtual memory remains constant at printf '0x%X\n' 0x40009A4 KiB ~= 64GiB (ps values are in KiB) after the mmap.
RSS "real memory usage" increases lazily only as we touch the pages. For example:
on the first print, we have extra_memory_committed 0, which means we haven't yet touched any pages. RSS is a small 1648 KiB which has been allocated for normal program startup like text area, globals, etc.
on the second print, we have written to 8388608 KiB == 8GiB worth of pages. As a result, RSS increased by exactly 8GIB to 8390256 KiB == 8388608 KiB + 1648 KiB
RSS continues to increase in 8GiB increments. The last print shows about 24 GiB of memory, and before 32 GiB could be printed, the OOM killer killed the process
See also: https://unix.stackexchange.com/questions/35129/need-explanation-on-resident-set-size-virtual-size
OOM killer logs
Our dmesg commands have shown the OOM killer logs.
An exact interpretation of those has been asked at:
Understanding the Linux oom-killer's logs but let's have a quick look here.
https://serverfault.com/questions/548736/how-to-read-oom-killer-syslog-messages
The very first line of the log was:
[ 7283.479087] mongod invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
So we see that interestingly it was the MongoDB daemon that always runs in my laptop on the background that first triggered the OOM killer, presumably when the poor thing was trying to allocate some memory.
However, the OOM killer does not necessarily kill the one who awoke it.
After the invocation, the kernel prints a table or processes including the oom_score:
[ 7283.479292] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 7283.479303] [ 496] 0 496 16126 6 172032 484 0 systemd-journal
[ 7283.479306] [ 505] 0 505 1309 0 45056 52 0 blkmapd
[ 7283.479309] [ 513] 0 513 19757 0 57344 55 0 lvmetad
[ 7283.479312] [ 516] 0 516 4681 1 61440 444 -1000 systemd-udevd
and further ahead we see that our own little main.out actually got killed on the previous invocation:
[ 7283.479871] Out of memory: Kill process 15665 (main.out) score 865 or sacrifice child
[ 7283.479879] Killed process 15665 (main.out) total-vm:67111332kB, anon-rss:92kB, file-rss:4kB, shmem-rss:30080832kB
[ 7283.479951] oom_reaper: reaped process 15665 (main.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:30080832kB
This log mentions the score 865 which that process had, presumably the highest (worst) OOM killer score as mentioned at: https://unix.stackexchange.com/questions/153585/how-does-the-oom-killer-decide-which-process-to-kill-first
Also interestingly, everything apparently happened so fast that before the freed memory was accounted, the oom was awoken again by the DeadlineMonitor process:
[ 7283.481043] DeadlineMonitor invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
and this time that killed some Chromium process, which is usually my computers normal memory hog:
[ 7283.481773] Out of memory: Kill process 11786 (chromium-browse) score 306 or sacrifice child
[ 7283.481833] Killed process 11786 (chromium-browse) total-vm:1813576kB, anon-rss:208804kB, file-rss:0kB, shmem-rss:8380kB
[ 7283.497847] oom_reaper: reaped process 11786 (chromium-browse), now anon-rss:0kB, file-rss:0kB, shmem-rss:8044kB
Tested in Ubuntu 19.04, Linux kernel 5.0.0.
Linux kernel docs
https://github.com/torvalds/linux/blob/v5.17/Documentation/filesystems/proc.rst has some points. The term "VSZ" is not used there but "RSS" is, and there's nothing too enlightening (surprise?!)
Instead of VSZ, the kernel seems to use the term VmSize, which appears e.g. on /proc/$PID/status.
Some quotes of interest:
The first of these lines shows the same information as is displayed for the mapping in /proc/PID/maps. Following lines show the size of the mapping (size); the size of each page allocated when backing a VMA (KernelPageSize), which is usually the same as the size in the page table entries; the page size used by the MMU when backing a VMA (in most cases, the same as KernelPageSize); the amount of the mapping that is currently resident in RAM (RSS); the process' proportional share of this mapping (PSS); and the number of clean and dirty shared and private pages in the mapping.
The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. So if a process has 1000 pages all to itself, and 1000 shared with one other process, its PSS will be 1500.
Note that even a page which is part of a MAP_SHARED mapping, but has only a single pte mapped, i.e. is currently used by only one process, is accounted as private and not as shared.
So we can guess a few more things:
shared libraries used by a single process appear in RSS, if more than one process has them then not
PSS was mentioned by jmh, and has a more proportional approach between "I'm the only process that holds the shared library" and "there are N process holding the shared library, so each one holds memory/N on average"
VSZ - Virtual Set Size
The Virtual Set Size is a memory size assigned to a process ( program ) during the initial execution. The Virtual Set Size memory is simply a number of how much memory a process has available for its execution.
RSS - Resident Set Size (kinda RAM)
As opposed to VSZ ( Virtual Set Size ), RSS is a memory currently used by a process. This is a actual number in kilobytes of how much RAM the current process is using.
Source
I think much has already been said, about RSS vs VSZ. From an administrator/programmer/user perspective, when I design/code applications I am more concerned about the RSZ, (Resident memory), as and when you keep pulling more and more variables (heaped) you will see this value shooting up. Try a simple program to build malloc based space allocation in loop, and make sure you fill data in that malloc'd space. RSS keeps moving up.
As far as VSZ is concerned, it's more of virtual memory mapping that linux does, and one of its core features derived out of conventional operating system concepts. The VSZ management is done by Virtual memory management of the kernel, for more info on VSZ, see Robert Love's description on mm_struct and vm_struct, which are part of basic task_struct data structure in kernel.
To summarize #jmh excellent answer :
In #linux, a process's memory comprises :
its own binary
its shared libs
its stack and heap
Due to paging, not all of those are always fully in memory, only the useful, most recently used parts (pages) are. Other parts are paged out (or swapped out) to make room for other processes.
The table below, taken from #jmh's answer, shows an example of what Resident and Virtual memory are for a specific process.
+-------------+-------------------------+------------------------+
| portion | actually in memory | total (allocated) size |
|-------------+-------------------------+------------------------|
| binary | 400K | 500K |
| shared libs | 1000K | 2500K |
| stack+heap | 100K | 200K |
|-------------+-------------------------+------------------------|
| | RSS (Resident Set Size) | VSZ (Virtual Set Size) |
|-------------+-------------------------+------------------------|
| | 1500K | 3200K |
+-------------+-------------------------+------------------------+
To summarize : resident memory is what is actually in physical memory right now, and virtual size is the total, necessary physical memory to load all components.
Of course, numbers don't add up, because libraries are shared between multiple processes and their memory is counted for each process separately, even if they are loaded only once.
They are not managed, but measured and possibly limited (see getrlimit system call, also on getrlimit(2)).
RSS means resident set size (the part of your virtual address space sitting in RAM).
You can query the virtual address space of process 1234 using proc(5) with cat /proc/1234/maps and its status (including memory consumption) thru cat /proc/1234/status